From YARD at il.ibm.com Sun Jul 1 18:12:04 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Sun, 1 Jul 2018 20:12:04 +0300 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Hi Just check : 1) getenfore - Selinux status 2) check if FW is active - iptables -L 3) do u have ping to the host report in mmlscluster ? /etc/hosts valid ? DNS is valid ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Uwe Falke" To: renata at SLAC.STANFORD.EDU, gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Date: 06/28/2018 10:45 AM Subject: Re: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Just some ideas what to try. when you attempted mmdelnode, was that node still active with the IP address known in the cluster? If so, shut it down and try again. Mind the restrictions of mmdelnode though (can't delete NSD servers). Try to fake one of the currently missing cluster nodes, or restore the old system backup to the reinstalled server, if available, or temporarily install gpfs SW there and copy over the GPFS config stuff from a node still active (/var/mmfs/), configure the admin and daemon IFs of the faked node on that machine, then try to start the cluster and see if it comes up with quorum, if it does then go ahead and cleanly de-configure what's needed to remove that node from the cluster gracefully. Once you reach quorum with the remaining nodes you are in safe area. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Renata Maria Dart To: Simon Thompson Cc: gpfsug main discussion list Date: 27/06/2018 21:30 Subject: Re: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Simon, yes I ran mmsdrrestore -p and that helped to create the /var/mmfs/ccr directory which was missing. But it didn't create a ccr.nodes file, so I ended up scp'ng that over by hand which I hope was the right thing to do. The one host that is no longer in service is still in that ccr.nodes file and when I try to mmdelnode it I get: root at ocio-gpu03 renata]# mmdelnode -N dhcp-os-129-164.slac.stanford.edu mmdelnode: Unable to obtain the GPFS configuration file lock. mmdelnode: GPFS was unable to obtain a lock from node dhcp-os-129-164.slac.stanford.edu. mmdelnode: Command failed. Examine previous error messages to determine cause. despite the fact that it doesn't respond to ping. The mmstartup on the newly reinstalled node fails as in my initial email. I should mention that the two "working" nodes are running 4.2.3.4. The person who reinstalled the node that won't start up put on 4.2.3.8. I didn't think that was the cause of this problem though and thought I would try to get the cluster talking again before upgrading the rest of the nodes or degrading the reinstalled one. Thanks, Renata On Wed, 27 Jun 2018, Simon Thompson wrote: >Have you tried running mmsdrestore in the reinstalled node to reads to the cluster and then try and startup gpfs on it? > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1pdg_mmsdrrest.htm > >Simon >________________________________________ >From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Renata Maria Dart [renata at slac.stanford.edu] >Sent: 27 June 2018 19:09 >To: gpfsug-discuss at spectrumscale.org >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving data, > >root at ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine cause. > > >On the reinstalled node, this fails in the same way: > >[root at ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From YARD at il.ibm.com Sun Jul 1 18:17:42 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Sun, 1 Jul 2018 20:17:42 +0300 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events In-Reply-To: References: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> Message-ID: Hi There is was issue with Scale 5.x GUI error - ib_rdma_nic_unrecognized(mlx5_0/2) Check if you have the patch: [root at gssio1 ~]# diff /usr/lpp/mmfs/lib/mmsysmon/NetworkService.py /tmp/NetworkService.py 229c229,230 < recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) --- > #recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) > recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+/\d+\n", mmfsadm)) And restart the - mmsysmoncontrol restart Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Andrew Beattie" To: gpfsug-discuss at spectrumscale.org Date: 06/28/2018 11:16 AM Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Sent by: gpfsug-discuss-bounces at spectrumscale.org Do you know if there is actually a cable plugged into port 2? The system will work fine as long as there is network connectivity, but you may have an issue with redundancy or loss of bandwidth if you do not have every port cabled and configured correctly. Regards Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Date: Thu, Jun 28, 2018 6:08 PM Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root at sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root at sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root at sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root at sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root at sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From oehmes at gmail.com Mon Jul 2 06:26:16 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 2 Jul 2018 07:26:16 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Hi, most traditional raid controllers can't deal well with blocksizes above 4m, which is why the new default is 4m and i would leave it at that unless you know for sure you get better performance with 8mb which typically requires your raid controller volume full block size to be 8mb with maybe a 8+2p @1mb strip size (many people confuse strip size with full track size) . if you don't have dedicated SSDs for metadata i would recommend to just use a 4mb blocksize with mixed data and metadata disks, if you have a reasonable number of SSD's put them in a raid 1 or raid 10 and use them as dedicated metadata and the other disks as dataonly , but i would not use the --metadata-block-size parameter as it prevents the datapool to use large number of subblocks. as long as your SSDs are on raid 1 or 10 there is no read/modify/write penalty, so using them with the 4mb blocksize has no real negative impact at least on controllers i have worked with. hope this helps. On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza wrote: > Hi, it's for a traditional NSD setup. > > --Joey > > On 6/26/18 12:21 AM, Sven Oehme wrote: > > Joseph, > > the subblocksize will be derived from the smallest blocksize in the > filesytem, given you specified a metadata block size of 512k thats what > will be used to calculate the number of subblocks, even your data pool is > 4mb. > is this setup for a traditional NSD Setup or for GNR as the > recommendations would be different. > > sven > > On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza wrote: > >> Quick question, anyone know why GPFS wouldn't respect the default for >> the subblocks-per-full-block parameter when creating a new filesystem? >> I'd expect it to be set to 512 for an 8MB block size but my guess is >> that also specifying a metadata-block-size is interfering with it (by >> being too small). This was a parameter recommended by the vendor for a >> 4.2 installation with metadata on dedicated SSDs in the system pool, any >> best practices for 5.0? I'm guessing I'd have to bump it up to at least >> 4MB to get 512 subblocks for both pools. >> >> fs1 created with: >> # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j >> cluster -n 9000 --metadata-block-size 512K --perfileset-quota >> --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T >> /gpfs/fs1 >> >> # mmlsfs fs1 >> >> >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 8192 Minimum fragment (subblock) >> size in bytes (system pool) >> 131072 Minimum fragment (subblock) >> size in bytes (other pools) >> -i 4096 Inode size in bytes >> -I 32768 Indirect block size in bytes >> >> -B 524288 Block size (system pool) >> 8388608 Block size (other pools) >> >> -V 19.01 (5.0.1.0) File system version >> >> --subblocks-per-full-block 64 Number of subblocks per >> full block >> -P system;DATA Disk storage pools in file >> system >> >> >> Thanks! >> --Joey Mendoza >> NCAR >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mutantllama at gmail.com Mon Jul 2 07:55:07 2018 From: mutantllama at gmail.com (Carl) Date: Mon, 2 Jul 2018 16:55:07 +1000 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Hi Sven, What is the resulting indirect-block size with a 4mb metadata block size? Does the new sub-block magic mean that it will take up 32k, or will it occupy 128k? Cheers, Carl. On Mon, 2 Jul 2018 at 15:26, Sven Oehme wrote: > Hi, > > most traditional raid controllers can't deal well with blocksizes above > 4m, which is why the new default is 4m and i would leave it at that unless > you know for sure you get better performance with 8mb which typically > requires your raid controller volume full block size to be 8mb with maybe a > 8+2p @1mb strip size (many people confuse strip size with full track > size) . > if you don't have dedicated SSDs for metadata i would recommend to just > use a 4mb blocksize with mixed data and metadata disks, if you have a > reasonable number of SSD's put them in a raid 1 or raid 10 and use them as > dedicated metadata and the other disks as dataonly , but i would not use > the --metadata-block-size parameter as it prevents the datapool to use > large number of subblocks. > as long as your SSDs are on raid 1 or 10 there is no read/modify/write > penalty, so using them with the 4mb blocksize has no real negative impact > at least on controllers i have worked with. > > hope this helps. > > On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza wrote: > >> Hi, it's for a traditional NSD setup. >> >> --Joey >> >> On 6/26/18 12:21 AM, Sven Oehme wrote: >> >> Joseph, >> >> the subblocksize will be derived from the smallest blocksize in the >> filesytem, given you specified a metadata block size of 512k thats what >> will be used to calculate the number of subblocks, even your data pool is >> 4mb. >> is this setup for a traditional NSD Setup or for GNR as the >> recommendations would be different. >> >> sven >> >> On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza wrote: >> >>> Quick question, anyone know why GPFS wouldn't respect the default for >>> the subblocks-per-full-block parameter when creating a new filesystem? >>> I'd expect it to be set to 512 for an 8MB block size but my guess is >>> that also specifying a metadata-block-size is interfering with it (by >>> being too small). This was a parameter recommended by the vendor for a >>> 4.2 installation with metadata on dedicated SSDs in the system pool, any >>> best practices for 5.0? I'm guessing I'd have to bump it up to at least >>> 4MB to get 512 subblocks for both pools. >>> >>> fs1 created with: >>> # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j >>> cluster -n 9000 --metadata-block-size 512K --perfileset-quota >>> --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T >>> /gpfs/fs1 >>> >>> # mmlsfs fs1 >>> >>> >>> flag value description >>> ------------------- ------------------------ >>> ----------------------------------- >>> -f 8192 Minimum fragment (subblock) >>> size in bytes (system pool) >>> 131072 Minimum fragment (subblock) >>> size in bytes (other pools) >>> -i 4096 Inode size in bytes >>> -I 32768 Indirect block size in bytes >>> >>> -B 524288 Block size (system pool) >>> 8388608 Block size (other pools) >>> >>> -V 19.01 (5.0.1.0) File system version >>> >>> --subblocks-per-full-block 64 Number of subblocks per >>> full block >>> -P system;DATA Disk storage pools in file >>> system >>> >>> >>> Thanks! >>> --Joey Mendoza >>> NCAR >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Jul 2 08:46:25 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 2 Jul 2018 09:46:25 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Jul 2 08:55:10 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 2 Jul 2018 09:55:10 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Olaf, he is talking about indirect size not subblock size . Carl, here is a screen shot of a 4mb filesystem : [root at p8n15hyp ~]# mmlsfs all_local File system attributes for /dev/fs2-4m-07: ========================================== flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 512 Estimated number of nodes that will mount file system -B 4194304 Block size -Q none Quotas accounting enabled none Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Mon Jun 18 12:30:54 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 4000000000 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d RG001VS001;RG002VS001;RG003VS002;RG004VS002 Disks in file system -A no Automatic mount option -o none Additional mount options -T /gpfs/fs2-4m-07 Default mount point --mount-priority 0 Mount priority as you can see indirect size is 32k sven On Mon, Jul 2, 2018 at 9:46 AM Olaf Weiser wrote: > HI Carl, > 8k for 4 M Blocksize > files < ~3,x KB fits into the inode , for "larger" files (> 3,x KB) at > least one "subblock" be allocated .. > > in R < 5.x ... it was fixed 1/32 from blocksize so subblocksize is > retrieved from the blocksize ... > since R >5 (so new created file systems) .. the new default block size is > 4 MB, fragment size is 8k (512 subblocks) > for even larger block sizes ... more subblocks are available per block > so e.g. > 8M .... 1024 subblocks (fragment size is 8 k again) > > @Sven.. correct me, if I'm wrong ... > > > > > > > From: Carl > > To: gpfsug main discussion list > Date: 07/02/2018 08:55 AM > Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Sven, > > What is the resulting indirect-block size with a 4mb metadata block size? > > Does the new sub-block magic mean that it will take up 32k, or will it > occupy 128k? > > Cheers, > > Carl. > > > On Mon, 2 Jul 2018 at 15:26, Sven Oehme <*oehmes at gmail.com* > > wrote: > Hi, > > most traditional raid controllers can't deal well with blocksizes above > 4m, which is why the new default is 4m and i would leave it at that unless > you know for sure you get better performance with 8mb which typically > requires your raid controller volume full block size to be 8mb with maybe a > 8+2p @1mb strip size (many people confuse strip size with full track size) . > if you don't have dedicated SSDs for metadata i would recommend to just > use a 4mb blocksize with mixed data and metadata disks, if you have a > reasonable number of SSD's put them in a raid 1 or raid 10 and use them as > dedicated metadata and the other disks as dataonly , but i would not use > the --metadata-block-size parameter as it prevents the datapool to use > large number of subblocks. > as long as your SSDs are on raid 1 or 10 there is no read/modify/write > penalty, so using them with the 4mb blocksize has no real negative impact > at least on controllers i have worked with. > > hope this helps. > > On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza <*jam at ucar.edu* > > wrote: > Hi, it's for a traditional NSD setup. > > --Joey > > > On 6/26/18 12:21 AM, Sven Oehme wrote: > Joseph, > > the subblocksize will be derived from the smallest blocksize in the > filesytem, given you specified a metadata block size of 512k thats what > will be used to calculate the number of subblocks, even your data pool is > 4mb. > is this setup for a traditional NSD Setup or for GNR as the > recommendations would be different. > > sven > > On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza <*jam at ucar.edu* > > wrote: > Quick question, anyone know why GPFS wouldn't respect the default for > the subblocks-per-full-block parameter when creating a new filesystem? > I'd expect it to be set to 512 for an 8MB block size but my guess is > that also specifying a metadata-block-size is interfering with it (by > being too small). This was a parameter recommended by the vendor for a > 4.2 installation with metadata on dedicated SSDs in the system pool, any > best practices for 5.0? I'm guessing I'd have to bump it up to at least > 4MB to get 512 subblocks for both pools. > > fs1 created with: > # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j > cluster -n 9000 --metadata-block-size 512K --perfileset-quota > --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 > > # mmlsfs fs1 > > > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment (subblock) > size in bytes (system pool) > 131072 Minimum fragment (subblock) > size in bytes (other pools) > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > > -B 524288 Block size (system pool) > 8388608 Block size (other pools) > > -V 19.01 (5.0.1.0) File system version > > --subblocks-per-full-block 64 Number of subblocks per > full block > -P system;DATA Disk storage pools in file > system > > > Thanks! > --Joey Mendoza > NCAR > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mutantllama at gmail.com Mon Jul 2 10:57:11 2018 From: mutantllama at gmail.com (Carl) Date: Mon, 2 Jul 2018 19:57:11 +1000 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Thanks Olaf and Sven, It looks like a lot of advice from the wiki ( https://www.ibm.com/developerworks/community/wikis/home?lang=en-us#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Data%20and%20Metadata) is no longer relevant for version 5. Any idea if its likely to be updated soon? The new subblock changes appear to have removed a lot of reasons for using smaller block sizes. In broad terms there any situations where you would recommend using less than the new default block size? Cheers, Carl. On Mon, 2 Jul 2018 at 17:55, Sven Oehme wrote: > Olaf, he is talking about indirect size not subblock size . > > Carl, > > here is a screen shot of a 4mb filesystem : > > [root at p8n15hyp ~]# mmlsfs all_local > > File system attributes for /dev/fs2-4m-07: > ========================================== > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment (subblock) > size in bytes > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 1 Default number of metadata > replicas > -M 2 Maximum number of metadata > replicas > -r 1 Default number of data > replicas > -R 2 Maximum number of data > replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in > effect > -k all ACL semantics in effect > -n 512 Estimated number of nodes > that will mount file system > -B 4194304 Block size > -Q none Quotas accounting enabled > none Quotas enforced > none Default quotas enabled > --perfileset-quota No Per-fileset quota enforcement > --filesetdf No Fileset df enabled? > -V 19.01 (5.0.1.0) File system version > --create-time Mon Jun 18 12:30:54 2018 File system creation time > -z No Is DMAPI enabled? > -L 33554432 Logfile size > -E Yes Exact mtime mount option > -S relatime Suppress atime mount option > -K whenpossible Strict replica allocation > option > --fastea Yes Fast external attributes > enabled? > --encryption No Encryption enabled? > --inode-limit 4000000000 Maximum number of inodes > --log-replicas 0 Number of log replicas > --is4KAligned Yes is4KAligned? > --rapid-repair Yes rapidRepair enabled? > --write-cache-threshold 0 HAWC Threshold (max 65536) > --subblocks-per-full-block 512 Number of subblocks per full > block > -P system Disk storage pools in file > system > --file-audit-log No File Audit Logging enabled? > --maintenance-mode No Maintenance Mode enabled? > -d RG001VS001;RG002VS001;RG003VS002;RG004VS002 Disks in > file system > -A no Automatic mount option > -o none Additional mount options > -T /gpfs/fs2-4m-07 Default mount point > --mount-priority 0 Mount priority > > as you can see indirect size is 32k > > sven > > On Mon, Jul 2, 2018 at 9:46 AM Olaf Weiser wrote: > >> HI Carl, >> 8k for 4 M Blocksize >> files < ~3,x KB fits into the inode , for "larger" files (> 3,x KB) at >> least one "subblock" be allocated .. >> >> in R < 5.x ... it was fixed 1/32 from blocksize so subblocksize is >> retrieved from the blocksize ... >> since R >5 (so new created file systems) .. the new default block size is >> 4 MB, fragment size is 8k (512 subblocks) >> for even larger block sizes ... more subblocks are available per block >> so e.g. >> 8M .... 1024 subblocks (fragment size is 8 k again) >> >> @Sven.. correct me, if I'm wrong ... >> >> >> >> >> >> >> From: Carl >> >> To: gpfsug main discussion list >> Date: 07/02/2018 08:55 AM >> Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> ------------------------------ >> >> >> >> Hi Sven, >> >> What is the resulting indirect-block size with a 4mb metadata block size? >> >> Does the new sub-block magic mean that it will take up 32k, or will it >> occupy 128k? >> >> Cheers, >> >> Carl. >> >> >> On Mon, 2 Jul 2018 at 15:26, Sven Oehme <*oehmes at gmail.com* >> > wrote: >> Hi, >> >> most traditional raid controllers can't deal well with blocksizes above >> 4m, which is why the new default is 4m and i would leave it at that unless >> you know for sure you get better performance with 8mb which typically >> requires your raid controller volume full block size to be 8mb with maybe a >> 8+2p @1mb strip size (many people confuse strip size with full track size) . >> if you don't have dedicated SSDs for metadata i would recommend to just >> use a 4mb blocksize with mixed data and metadata disks, if you have a >> reasonable number of SSD's put them in a raid 1 or raid 10 and use them as >> dedicated metadata and the other disks as dataonly , but i would not use >> the --metadata-block-size parameter as it prevents the datapool to use >> large number of subblocks. >> as long as your SSDs are on raid 1 or 10 there is no read/modify/write >> penalty, so using them with the 4mb blocksize has no real negative impact >> at least on controllers i have worked with. >> >> hope this helps. >> >> On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza <*jam at ucar.edu* >> > wrote: >> Hi, it's for a traditional NSD setup. >> >> --Joey >> >> >> On 6/26/18 12:21 AM, Sven Oehme wrote: >> Joseph, >> >> the subblocksize will be derived from the smallest blocksize in the >> filesytem, given you specified a metadata block size of 512k thats what >> will be used to calculate the number of subblocks, even your data pool is >> 4mb. >> is this setup for a traditional NSD Setup or for GNR as the >> recommendations would be different. >> >> sven >> >> On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza <*jam at ucar.edu* >> > wrote: >> Quick question, anyone know why GPFS wouldn't respect the default for >> the subblocks-per-full-block parameter when creating a new filesystem? >> I'd expect it to be set to 512 for an 8MB block size but my guess is >> that also specifying a metadata-block-size is interfering with it (by >> being too small). This was a parameter recommended by the vendor for a >> 4.2 installation with metadata on dedicated SSDs in the system pool, any >> best practices for 5.0? I'm guessing I'd have to bump it up to at least >> 4MB to get 512 subblocks for both pools. >> >> fs1 created with: >> # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j >> cluster -n 9000 --metadata-block-size 512K --perfileset-quota >> --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T >> /gpfs/fs1 >> >> # mmlsfs fs1 >> >> >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 8192 Minimum fragment (subblock) >> size in bytes (system pool) >> 131072 Minimum fragment (subblock) >> size in bytes (other pools) >> -i 4096 Inode size in bytes >> -I 32768 Indirect block size in bytes >> >> -B 524288 Block size (system pool) >> 8388608 Block size (other pools) >> >> -V 19.01 (5.0.1.0) File system version >> >> --subblocks-per-full-block 64 Number of subblocks per >> full block >> -P system;DATA Disk storage pools in file >> system >> >> >> Thanks! >> --Joey Mendoza >> NCAR >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lore at cscs.ch Mon Jul 2 14:50:37 2018 From: lore at cscs.ch (Lo Re Giuseppe) Date: Mon, 2 Jul 2018 13:50:37 +0000 Subject: [gpfsug-discuss] Zimon metrics details Message-ID: <89EC4307-DDE4-42FD-B73A-12F79A3BA22F@cscs.ch> Hi everybody, I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? The SS probelm determination guide doens?t spend more than half a line for each. In particular I would like to understand the difference between these ones: - gpfs_fs_bytes_read - gpfs_fis_bytes_read The second gives tipically higher values than the first one. Thanks for any hit. Regards, Giuseppe *********************************************************************** Giuseppe Lo Re CSCS - Swiss National Supercomputing Center Via Trevano 131 CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 Switzerland Email: giuseppe.lore at cscs.ch *********************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Mon Jul 2 15:04:39 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 2 Jul 2018 07:04:39 -0700 Subject: [gpfsug-discuss] Zimon metrics details In-Reply-To: <89EC4307-DDE4-42FD-B73A-12F79A3BA22F@cscs.ch> References: <89EC4307-DDE4-42FD-B73A-12F79A3BA22F@cscs.ch> Message-ID: <523F9FE0-CA7D-4655-AFC5-BEBC1F56FC34@lbl.gov> +1 Would love to see more detailed descriptions on Zimon metrics. Sent from my iPhone > On Jul 2, 2018, at 6:50 AM, Lo Re Giuseppe wrote: > > Hi everybody, > > I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. > Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) > > Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? > The SS probelm determination guide doens?t spend more than half a line for each. > > In particular I would like to understand the difference between these ones: > > - gpfs_fs_bytes_read > - gpfs_fis_bytes_read > > The second gives tipically higher values than the first one. > > Thanks for any hit. > > Regards, > > Giuseppe > > *********************************************************************** > > Giuseppe Lo Re > > CSCS - Swiss National Supercomputing Center > > Via Trevano 131 > > CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 > > Switzerland Email: giuseppe.lore at cscs.ch > > *********************************************************************** > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From agar at us.ibm.com Mon Jul 2 16:05:33 2018 From: agar at us.ibm.com (Eric Agar) Date: Mon, 2 Jul 2018 11:05:33 -0400 Subject: [gpfsug-discuss] Zimon metrics details In-Reply-To: <523F9FE0-CA7D-4655-AFC5-BEBC1F56FC34@lbl.gov> References: <89EC4307-DDE4-42FD-B73A-12F79A3BA22F@cscs.ch> <523F9FE0-CA7D-4655-AFC5-BEBC1F56FC34@lbl.gov> Message-ID: Hello Giuseppe, Following was my attempt to answer a similar question some months ago. When reading about the different viewpoints of the Zimon sensors, please note that gpfs_fis_bytes_read is a metric provided by the GPFSFileSystemAPI sensor, while gpfs_fs_bytes_read is a metric provided by the GPFSFileSystem sensor. Therefore, gpfs_fis_bytes_read reflects application reads, while gpfs_fs_bytes_read reflects NSD reads. The GPFSFileSystemAPI and GPFSNodeAPI sensor metrics are from the point of view of "applications" in the sense that they provide stats about I/O requests made to files in GPFS file systems from user level applications using POSIX interfaces like open(), close(), read(), write(), etc. This is in contrast to similarly named sensors without the "API" suffix, like GPFSFilesystem and GPFSNode. Those sensors provide stats about I/O requests made by the GPFS code to NSDs (disks) making up GPFS file systems. The relationship between application I/O and disk I/O might or might not be obvious. Consider some examples. An application that starts sequentially reading a file might, at least initially, cause more disk I/O than expected because GPFS has decided to prefetch data. An application write() might not immediately cause the writing of disk blocks, due to the operation of the pagepool. Ultimately, application write()s might cause twice as much data written to disk due to the replication factor of the file system. Application I/O concerns itself with user data; disk I/O might have to occur to handle the user data and associated file system metadata (like inodes and indirect blocks). The difference between GPFSFileSystemAPI and GPFSNodeAPI: GPFSFileSystemAPI reports stats for application I/O per filesystem per node; GPFSNodeAPI reports application I/O stats per node. Similarly, GPFSFilesystem reports stats for disk I/O per filesystem per node; GPFSNode reports disk I/O stats per node. Eric M. Agar agar at us.ibm.com IBM Spectrum Scale Level 2 Software Defined Infrastructure, IBM Systems From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 07/02/2018 10:06 AM Subject: Re: [gpfsug-discuss] Zimon metrics details Sent by: gpfsug-discuss-bounces at spectrumscale.org +1 Would love to see more detailed descriptions on Zimon metrics. Sent from my iPhone On Jul 2, 2018, at 6:50 AM, Lo Re Giuseppe wrote: Hi everybody, I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? The SS probelm determination guide doens?t spend more than half a line for each. In particular I would like to understand the difference between these ones: - gpfs_fs_bytes_read - gpfs_fis_bytes_read The second gives tipically higher values than the first one. Thanks for any hit. Regards, Giuseppe *********************************************************************** Giuseppe Lo Re CSCS - Swiss National Supercomputing Center Via Trevano 131 CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 Switzerland Email: giuseppe.lore at cscs.ch *********************************************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From sandeep.patil at in.ibm.com Mon Jul 2 19:43:20 2018 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Mon, 2 Jul 2018 18:43:20 +0000 Subject: [gpfsug-discuss] Latest Technical Blogs on Spectrum Scale (Q2 2018) In-Reply-To: References: Message-ID: Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q2 2018). We now have over 100+ developer blogs. As discussed in User Groups, passing it along: IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ IBM Spectrum Scale ILM Policies https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/ IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ Management GUI enhancements in IBM Spectrum Scale release 5.0.1 https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/ Managing IBM Spectrum Scale services through GUI https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/ Use AWS CLI with IBM Spectrum Scale? object storage https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/ Hadoop Storage Tiering with IBM Spectrum Scale https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/ How many Files on my Filesystem? https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/ Recording Spectrum Scale Object Stats for Potential Billing like Purpose using Elasticsearch https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/ New features in IBM Elastic Storage Server (ESS) Version 5.3 https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/ Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send earlier) https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19 Redpapers Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/abstracts/redp5411.html?Open SAP HANA and ESS: A Winning Combination (Update) http://www.redbooks.ibm.com/abstracts/redp5436.html?Open Others IBM Spectrum Scale Software Version Recommendation Preventive Service Planning (Updated) http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in HCLS https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN& For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 03/27/2018 05:23 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along: GDPR Compliance and Unstructured Data Storage https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/ IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/ Management GUI enhancements in IBM Spectrum Scale release 5.0.0 https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/ IBM Spectrum Scale 5.0.0 ? What?s new in NFS? https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/ Benefits and implementation of Spectrum Scale sudo wrappers https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/ IBM Spectrum Scale: Big Data and Analytics Solution Brief https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/ Variant Sub-blocks in Spectrum Scale 5.0 https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/ Compression support in Spectrum Scale 5.0.0 https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4 Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY. For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 01/10/2018 12:13 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, Here are list of development blogs in the last quarter. Passing it to this email group as Doris had got a feedback in the UG meetings to notify the members with the latest updates periodically. Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ IBM Spectrum Scale MMFSCK ? Savvy Enhancements https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/ ESS Disk Management https://developer.ibm.com/storage/2018/01/02/ess-disk-management/ IBM Spectrum Scale Object Protocol On Ubuntu https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/ IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/ A Complete Guide to ? Protocol Problem Determination Guide for IBM Spectrum Scale? ? Part 1 https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/ IBM Spectrum Scale installation toolkit ? enhancements over releases https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/ Network requirements in an Elastic Storage Server Setup https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/ Co-resident migration with Transparent cloud tierin https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/ IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/ Big data analytics with Spectrum Scale using remote cluster mount & multi-filesystem support https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/ IBM Spectrum Scale HDFS Transparency Short Circuit Write Support https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/ IBM Spectrum Scale HDFS Transparency Federation Support https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/ How to configure and performance tuning different system workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning database workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Hadoop workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/ How to Configure IBM Spectrum Scale? with NIS based Authentication. https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 11/16/2017 08:15 PM Subject: Latest Technical Blogs on Spectrum Scale Dear User Group members, Here are the Development Blogs in last 3 months on Spectrum Scale Technical Topics. Spectrum Scale Monitoring ? Know More ? https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/ IBM Spectrum Scale 5.0 Release ? What?s coming ! https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/ Four Essentials things to know for managing data ACLs on IBM Spectrum Scale? from Windows https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/ GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server https://developer.ibm.com/storage/2017/11/13/gssutils/ IBM Spectrum Scale Object Authentication https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/ Video Surveillance ? Choosing the right storage https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/ IBM Spectrum scale object deep dive training with problem determination https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training Spectrum Scale as preferred software defined storage for Ubuntu OpenStack https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/ IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a performance workhorse https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/ A Complete Guide to Configure LDAP-based authentication with IBM Spectrum Scale? for File Access https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/ Deploying IBM Spectrum Scale on AWS Quick Start https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/ Monitoring Spectrum Scale Object metrics https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/ Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk Universal https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/ Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol on IBM Spectrum Scale?? https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory and LDAP https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/ IBM Spectrum Scale? Authentication using Active Directory and RFC2307 https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/ High Availability Implementation with IBM Spectrum Virtualize and IBM Spectrum Scale https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/ 10 Frequently asked Questions on configuring Authentication using AD + AUTO ID mapping on IBM Spectrum Scale?. https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/ Five cool things that you didn?t know Transparent Cloud Tiering on Spectrum Scale can do https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/ IBM Spectrum Scale GUI videos https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/ IBM Spectrum Scale? Authentication ? Planning for NFS Access https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Mon Jul 2 21:17:26 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Mon, 2 Jul 2018 22:17:26 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Hi, Carl, Sven had mentioned the RMW penalty before which could make it beneficial to use smaller blocks. If you have traditional RAIDs and you go the usual route to do track sizes equal to the block size (stripe size = BS/n with n+p RAIDs), you may run into problems if your I/O are typically or very often smaller than a block because the controller needs to read the entire track, modifies it according to your I/O, and writes it back with the parity stripes. Example: with 4MiB BS and 8+2 RAIDS as NSDs, on each I/O smaller than 4MiB reaching an NSD the controller needs to read 4MiB into a buffer, modify it according to your I/O, calculate parity for the whole track and write back 5MiB (8 data stripes of 512kiB plus two parity stripes). In those cases you might be better off with smaller block sizes. In the above scenario, it might however still be ok to leave the block size at 4MiB and just reduce the track size of the RAIDs. One has to check how that affects performance, YMMV I'd say here. Mind that the ESS uses a clever way to mask these type of I/O from the n+p RS based vdisks, but even there one might need to think ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Carl To: gpfsug main discussion list Date: 02/07/2018 11:57 Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Olaf and Sven, It looks like a lot of advice from the wiki ( https://www.ibm.com/developerworks/community/wikis/home?lang=en-us#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Data%20and%20Metadata ) is no longer relevant for version 5. Any idea if its likely to be updated soon? The new subblock changes appear to have removed a lot of reasons for using smaller block sizes. In broad terms there any situations where you would recommend using less than the new default block size? Cheers, Carl. On Mon, 2 Jul 2018 at 17:55, Sven Oehme wrote: Olaf, he is talking about indirect size not subblock size . Carl, here is a screen shot of a 4mb filesystem : [root at p8n15hyp ~]# mmlsfs all_local File system attributes for /dev/fs2-4m-07: ========================================== flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 512 Estimated number of nodes that will mount file system -B 4194304 Block size -Q none Quotas accounting enabled none Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Mon Jun 18 12:30:54 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 4000000000 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d RG001VS001;RG002VS001;RG003VS002;RG004VS002 Disks in file system -A no Automatic mount option -o none Additional mount options -T /gpfs/fs2-4m-07 Default mount point --mount-priority 0 Mount priority as you can see indirect size is 32k sven On Mon, Jul 2, 2018 at 9:46 AM Olaf Weiser wrote: HI Carl, 8k for 4 M Blocksize files < ~3,x KB fits into the inode , for "larger" files (> 3,x KB) at least one "subblock" be allocated .. in R < 5.x ... it was fixed 1/32 from blocksize so subblocksize is retrieved from the blocksize ... since R >5 (so new created file systems) .. the new default block size is 4 MB, fragment size is 8k (512 subblocks) for even larger block sizes ... more subblocks are available per block so e.g. 8M .... 1024 subblocks (fragment size is 8 k again) @Sven.. correct me, if I'm wrong ... From: Carl To: gpfsug main discussion list Date: 07/02/2018 08:55 AM Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Sven, What is the resulting indirect-block size with a 4mb metadata block size? Does the new sub-block magic mean that it will take up 32k, or will it occupy 128k? Cheers, Carl. On Mon, 2 Jul 2018 at 15:26, Sven Oehme wrote: Hi, most traditional raid controllers can't deal well with blocksizes above 4m, which is why the new default is 4m and i would leave it at that unless you know for sure you get better performance with 8mb which typically requires your raid controller volume full block size to be 8mb with maybe a 8+2p @1mb strip size (many people confuse strip size with full track size) . if you don't have dedicated SSDs for metadata i would recommend to just use a 4mb blocksize with mixed data and metadata disks, if you have a reasonable number of SSD's put them in a raid 1 or raid 10 and use them as dedicated metadata and the other disks as dataonly , but i would not use the --metadata-block-size parameter as it prevents the datapool to use large number of subblocks. as long as your SSDs are on raid 1 or 10 there is no read/modify/write penalty, so using them with the 4mb blocksize has no real negative impact at least on controllers i have worked with. hope this helps. On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza wrote: Hi, it's for a traditional NSD setup. --Joey On 6/26/18 12:21 AM, Sven Oehme wrote: Joseph, the subblocksize will be derived from the smallest blocksize in the filesytem, given you specified a metadata block size of 512k thats what will be used to calculate the number of subblocks, even your data pool is 4mb. is this setup for a traditional NSD Setup or for GNR as the recommendations would be different. sven On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza wrote: Quick question, anyone know why GPFS wouldn't respect the default for the subblocks-per-full-block parameter when creating a new filesystem? I'd expect it to be set to 512 for an 8MB block size but my guess is that also specifying a metadata-block-size is interfering with it (by being too small). This was a parameter recommended by the vendor for a 4.2 installation with metadata on dedicated SSDs in the system pool, any best practices for 5.0? I'm guessing I'd have to bump it up to at least 4MB to get 512 subblocks for both pools. fs1 created with: # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j cluster -n 9000 --metadata-block-size 512K --perfileset-quota --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 # mmlsfs fs1 flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 131072 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -B 524288 Block size (system pool) 8388608 Block size (other pools) -V 19.01 (5.0.1.0) File system version --subblocks-per-full-block 64 Number of subblocks per full block -P system;DATA Disk storage pools in file system Thanks! --Joey Mendoza NCAR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From lore at cscs.ch Tue Jul 3 09:05:41 2018 From: lore at cscs.ch (Lo Re Giuseppe) Date: Tue, 3 Jul 2018 08:05:41 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 78, Issue 6 In-Reply-To: References: Message-ID: Dear Eric, thanks a lot for this information. And what about the gpfs_vfs metric group? What is the difference beteween for example ?gpfs_fis_read_calls" and ?gpfs_vfs_read? ? Again I see the second one being tipically higher than the first one. In addition gpfs_vfs_read is not related to a specific file system... [root at ela5 ~]# mmperfmon query gpfs_fis_read_calls -n1 -b 60 Legend: 1: ela5.cscs.ch|GPFSFilesystemAPI|durand.cscs.ch|store|gpfs_fis_read_calls 2: ela5.cscs.ch|GPFSFilesystemAPI|por.login.cscs.ch|apps|gpfs_fis_read_calls 3: ela5.cscs.ch|GPFSFilesystemAPI|por.login.cscs.ch|project|gpfs_fis_read_calls 4: ela5.cscs.ch|GPFSFilesystemAPI|por.login.cscs.ch|users|gpfs_fis_read_calls Row Timestamp gpfs_fis_read_calls gpfs_fis_read_calls gpfs_fis_read_calls gpfs_fis_read_calls 1 2018-07-03-10:03:00 0 0 7274 0 [root at ela5 ~]# mmperfmon query gpfs_vfs_read -n1 -b 60 Legend: 1: ela5.cscs.ch|GPFSVFS|gpfs_vfs_read Row Timestamp gpfs_vfs_read 1 2018-07-03-10:03:00 45123 Cheers, Giuseppe *********************************************************************** Giuseppe Lo Re CSCS - Swiss National Supercomputing Center Via Trevano 131 CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 Switzerland Email: giuseppe.lore at cscs.ch *********************************************************************** Hello Giuseppe, Following was my attempt to answer a similar question some months ago. When reading about the different viewpoints of the Zimon sensors, please note that gpfs_fis_bytes_read is a metric provided by the GPFSFileSystemAPI sensor, while gpfs_fs_bytes_read is a metric provided by the GPFSFileSystem sensor. Therefore, gpfs_fis_bytes_read reflects application reads, while gpfs_fs_bytes_read reflects NSD reads. The GPFSFileSystemAPI and GPFSNodeAPI sensor metrics are from the point of view of "applications" in the sense that they provide stats about I/O requests made to files in GPFS file systems from user level applications using POSIX interfaces like open(), close(), read(), write(), etc. This is in contrast to similarly named sensors without the "API" suffix, like GPFSFilesystem and GPFSNode. Those sensors provide stats about I/O requests made by the GPFS code to NSDs (disks) making up GPFS file systems. The relationship between application I/O and disk I/O might or might not be obvious. Consider some examples. An application that starts sequentially reading a file might, at least initially, cause more disk I/O than expected because GPFS has decided to prefetch data. An application write() might not immediately cause the writing of disk blocks, due to the operation of the pagepool. Ultimately, application write()s might cause twice as much data written to disk due to the replication factor of the file system. Application I/O concerns itself with user data; disk I/O might have to occur to handle the user data and associated file system metadata (like inodes and indirect blocks). The difference between GPFSFileSystemAPI and GPFSNodeAPI: GPFSFileSystemAPI reports stats for application I/O per filesystem per node; GPFSNodeAPI reports application I/O stats per node. Similarly, GPFSFilesystem reports stats for disk I/O per filesystem per node; GPFSNode reports disk I/O stats per node. Eric M. Agar agar at us.ibm.com IBM Spectrum Scale Level 2 Software Defined Infrastructure, IBM Systems From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 07/02/2018 10:06 AM Subject: Re: [gpfsug-discuss] Zimon metrics details Sent by: gpfsug-discuss-bounces at spectrumscale.org +1 Would love to see more detailed descriptions on Zimon metrics. Sent from my iPhone On Jul 2, 2018, at 6:50 AM, Lo Re Giuseppe wrote: Hi everybody, I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? The SS probelm determination guide doens?t spend more than half a line for each. In particular I would like to understand the difference between these ones: - gpfs_fs_bytes_read - gpfs_fis_bytes_read The second gives tipically higher values than the first one. Thanks for any hit. Regards, Giuseppe *********************************************************************** Giuseppe Lo Re CSCS - Swiss National Supercomputing Center Via Trevano 131 CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 Switzerland Email: giuseppe.lore at cscs.ch *********************************************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 78, Issue 6 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: From Cameron.Dunn at bristol.ac.uk Tue Jul 3 12:49:03 2018 From: Cameron.Dunn at bristol.ac.uk (Cameron Dunn) Date: Tue, 3 Jul 2018 11:49:03 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms Message-ID: HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape are then shared by Samba to Macs and PCs. MacOS Finder and Windows Explorer will want to display all the thumbnail images of a folder's contents, which will recall lots of files from tape. According to the Samba documentation this is preventable by setting the following ---------------------------------------------- https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html gpfs:recalls = [ yes | no ] When this option is set to no, an attempt to open an offline file will be rejected with access denied. This helps preventing recall storms triggered by careless applications like Finder and Explorer. yes(default) - Open files that are offline. This will recall the files from HSM. no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. Using this setting also requires gpfs:hsm to be set to yes. gpfs:hsm = [ yes | no ] Enable/Disable announcing if this FS has HSM enabled. no(default) - Do not announce HSM. yes - Announce HSM. -------------------------------------------------- However we could not get this to work. On Centos7/Samba4.5, smb.conf contained gpfs:hsm = yes gpfs:recalls = no (also tried setting gpfs:offline = yes, though this is not documented) We made a share containing image files that were then migrated to tape by LTFS-EE, to see if these flags were respected by OS X Finder or Windows Explorer. Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, so that when browsing the stubs in the share, the files were recalled from tape and the thumbnails displayed. Has anyone seen these flags working as they are supposed to ? Many thanks for any ideas, Cameron Cameron Dunn Advanced Computing Systems Administrator Advanced Computing Research Centre University of Bristol -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Tue Jul 3 20:37:08 2018 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Tue, 3 Jul 2018 19:37:08 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jul 3 17:43:20 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 3 Jul 2018 16:43:20 +0000 Subject: [gpfsug-discuss] High I/O wait times Message-ID: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Jul 3 21:11:17 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 3 Jul 2018 16:11:17 -0400 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: Are you seeing similar values for all the nodes or just some of them? One possible issue is how the NSD queues are configured on the NSD servers. You can see this with the output of "mmfsadm dump nsd". There are queues for LARGE IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the highest pending values to see if many IOs are queueing. There are a couple of options to fix this but rather than explain them I suggest you look for information about NSD queueing on the developerWorks site. There has been information posted there that should prove helpful. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/03/2018 03:49 PM Subject: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jul 3 22:41:17 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 3 Jul 2018 21:41:17 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: <9877F844-6C60-42CD-8902-67C255F3ABD8@vanderbilt.edu> Hi Fred, Thanks for the response. I have been looking at the ?mmfsadm dump nsd? data from the two NSD servers that serve up the two NSDs that most commonly experience high wait times (although, again, this varies from time to time). In addition, I have been reading: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning And: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning Which seem to be the most relevant documents on the Wiki. I would like to do a more detailed analysis of the ?mmfsadm dump nsd? output, but my preliminary looks at it seems to indicate that I see I/O?s queueing in the 50 - 100 range for the small queues and the 60 - 200 range on the large queues. In addition, I am regularly seeing all 12 threads on the LARGE queues active, while it is much more rare that I see all - or even close to all - the threads on the SMALL queues active. As far as the parameters Scott and Yuri mention, on our cluster they are set thusly: [common] nsdMaxWorkerThreads 640 [] nsdMaxWorkerThreads 1024 [common] nsdThreadsPerQueue 4 [] nsdThreadsPerQueue 12 [common] nsdSmallThreadRatio 3 [] nsdSmallThreadRatio 1 So to me it sounds like I need more resources on the LARGE queue side of things ? i.e. it sure doesn?t sound like I want to change my small thread ratio. If I increase the amount of threads it sounds like that might help, but that also takes more pagepool, and I?ve got limited RAM in these (old) NSD servers. I do have nsdbufspace set to 70, but I?ve only got 16-24 GB RAM each in these NSD servers. And a while back I did try increase the page pool on them (very slightly) and ended up causing problems because then they ran out of physical RAM. Thoughts? Followup questions? Thanks! Kevin On Jul 3, 2018, at 3:11 PM, Frederick Stock wrote: Are you seeing similar values for all the nodes or just some of them? One possible issue is how the NSD queues are configured on the NSD servers. You can see this with the output of "mmfsadm dump nsd". There are queues for LARGE IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the highest pending values to see if many IOs are queueing. There are a couple of options to fix this but rather than explain them I suggest you look for information about NSD queueing on the developerWorks site. There has been information posted there that should prove helpful. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/03/2018 03:49 PM Subject: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938076014&sdata=wIyB66HoqvL13I3LX0Ott%2Btr7HQQdInZ028QUp0QMhE%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Jul 3 22:53:19 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 3 Jul 2018 17:53:19 -0400 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <9877F844-6C60-42CD-8902-67C255F3ABD8@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <9877F844-6C60-42CD-8902-67C255F3ABD8@vanderbilt.edu> Message-ID: How many NSDs are served by the NSD servers and what is your maximum file system block size? Have you confirmed that you have sufficient NSD worker threads to handle the maximum number of IOs you are configured to have active? That would be the number of NSDs served times 12 (you have 12 threads per queue). Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/03/2018 05:41 PM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Fred, Thanks for the response. I have been looking at the ?mmfsadm dump nsd? data from the two NSD servers that serve up the two NSDs that most commonly experience high wait times (although, again, this varies from time to time). In addition, I have been reading: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning And: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning Which seem to be the most relevant documents on the Wiki. I would like to do a more detailed analysis of the ?mmfsadm dump nsd? output, but my preliminary looks at it seems to indicate that I see I/O?s queueing in the 50 - 100 range for the small queues and the 60 - 200 range on the large queues. In addition, I am regularly seeing all 12 threads on the LARGE queues active, while it is much more rare that I see all - or even close to all - the threads on the SMALL queues active. As far as the parameters Scott and Yuri mention, on our cluster they are set thusly: [common] nsdMaxWorkerThreads 640 [] nsdMaxWorkerThreads 1024 [common] nsdThreadsPerQueue 4 [] nsdThreadsPerQueue 12 [common] nsdSmallThreadRatio 3 [] nsdSmallThreadRatio 1 So to me it sounds like I need more resources on the LARGE queue side of things ? i.e. it sure doesn?t sound like I want to change my small thread ratio. If I increase the amount of threads it sounds like that might help, but that also takes more pagepool, and I?ve got limited RAM in these (old) NSD servers. I do have nsdbufspace set to 70, but I?ve only got 16-24 GB RAM each in these NSD servers. And a while back I did try increase the page pool on them (very slightly) and ended up causing problems because then they ran out of physical RAM. Thoughts? Followup questions? Thanks! Kevin On Jul 3, 2018, at 3:11 PM, Frederick Stock wrote: Are you seeing similar values for all the nodes or just some of them? One possible issue is how the NSD queues are configured on the NSD servers. You can see this with the output of "mmfsadm dump nsd". There are queues for LARGE IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the highest pending values to see if many IOs are queueing. There are a couple of options to fix this but rather than explain them I suggest you look for information about NSD queueing on the developerWorks site. There has been information posted there that should prove helpful. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/03/2018 03:49 PM Subject: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938076014&sdata=wIyB66HoqvL13I3LX0Ott%2Btr7HQQdInZ028QUp0QMhE%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jul 3 23:05:25 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 3 Jul 2018 22:05:25 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <9877F844-6C60-42CD-8902-67C255F3ABD8@vanderbilt.edu> Message-ID: <2CB5B62E-A40A-4C47-B2D1-137BE87FBDDA@vanderbilt.edu> Hi Fred, I have a total of 48 NSDs served up by 8 NSD servers. 12 of those NSDs are in our small /home filesystem, which is performing just fine. The other 36 are in our ~1 PB /scratch and /data filesystem, which is where the problem is. Our max filesystem block size parameter is set to 16 MB, but the aforementioned filesystem uses a 1 MB block size. nsdMaxWorkerThreads is set to 1024 as shown below. Since each NSD server serves an average of 6 NSDs and 6 x 12 = 72 we?re OK if I?m understanding the calculation correctly. Even multiplying 48 x 12 = 576, so we?re good?!? Your help is much appreciated! Thanks again? Kevin On Jul 3, 2018, at 4:53 PM, Frederick Stock > wrote: How many NSDs are served by the NSD servers and what is your maximum file system block size? Have you confirmed that you have sufficient NSD worker threads to handle the maximum number of IOs you are configured to have active? That would be the number of NSDs served times 12 (you have 12 threads per queue). Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 07/03/2018 05:41 PM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Fred, Thanks for the response. I have been looking at the ?mmfsadm dump nsd? data from the two NSD servers that serve up the two NSDs that most commonly experience high wait times (although, again, this varies from time to time). In addition, I have been reading: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning And: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning Which seem to be the most relevant documents on the Wiki. I would like to do a more detailed analysis of the ?mmfsadm dump nsd? output, but my preliminary looks at it seems to indicate that I see I/O?s queueing in the 50 - 100 range for the small queues and the 60 - 200 range on the large queues. In addition, I am regularly seeing all 12 threads on the LARGE queues active, while it is much more rare that I see all - or even close to all - the threads on the SMALL queues active. As far as the parameters Scott and Yuri mention, on our cluster they are set thusly: [common] nsdMaxWorkerThreads 640 [] nsdMaxWorkerThreads 1024 [common] nsdThreadsPerQueue 4 [] nsdThreadsPerQueue 12 [common] nsdSmallThreadRatio 3 [] nsdSmallThreadRatio 1 So to me it sounds like I need more resources on the LARGE queue side of things ? i.e. it sure doesn?t sound like I want to change my small thread ratio. If I increase the amount of threads it sounds like that might help, but that also takes more pagepool, and I?ve got limited RAM in these (old) NSD servers. I do have nsdbufspace set to 70, but I?ve only got 16-24 GB RAM each in these NSD servers. And a while back I did try increase the page pool on them (very slightly) and ended up causing problems because then they ran out of physical RAM. Thoughts? Followup questions? Thanks! Kevin On Jul 3, 2018, at 3:11 PM, Frederick Stock > wrote: Are you seeing similar values for all the nodes or just some of them? One possible issue is how the NSD queues are configured on the NSD servers. You can see this with the output of "mmfsadm dump nsd". There are queues for LARGE IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the highest pending values to see if many IOs are queueing. There are a couple of options to fix this but rather than explain them I suggest you look for information about NSD queueing on the developerWorks site. There has been information posted there that should prove helpful. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 07/03/2018 03:49 PM Subject: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938076014&sdata=wIyB66HoqvL13I3LX0Ott%2Btr7HQQdInZ028QUp0QMhE%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C7658e1b458b147ad8a3908d5e12f6982%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662516110933587&sdata=RKuWKLRGoBRMSDHkrMsKsuU6JkiFgruK4e7gGafxAGc%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From scrusan at ddn.com Tue Jul 3 23:01:48 2018 From: scrusan at ddn.com (Steve Crusan) Date: Tue, 3 Jul 2018 22:01:48 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: Kevin, While this is happening, are you able to grab latency stats per LUN (hardware vendor agnostic) to see if there are any outliers? Also, when looking at the mmdiag output, are both reads and writes affected? Depending on the storage hardware, your writes might be hitting cache, so maybe this problem is being exasperated by many small reads (that are too random to be coalesced, take advantage of drive NCQ, etc). The other response about the nsd threads is also a good start, but if the I/O waits shift between different NSD servers and across hardware vendors, my assumption would be that you are hitting a bottleneck somewhere, but what you are seeing is symptoms of I/O backlog, which can manifest at any number of places. This could be something as low level as a few slow drives. Have you just started noticing this behavior? Any new applications on your system? Going by your institution, you're probably supposing a wide variety of codes, so if these problems just started happening, its possible that someone changed their code, or decided to run new scientific packages. -Steve ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: Tuesday, July 03, 2018 11:43 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] High I/O wait times Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 From taylorm at us.ibm.com Tue Jul 3 23:25:55 2018 From: taylorm at us.ibm.com (Michael L Taylor) Date: Tue, 3 Jul 2018 15:25:55 -0700 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 78, Issue 6 In-Reply-To: References: Message-ID: Hi Giuseppe, The GUI happens to document some of the zimon metrics in the KC here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1hlp_monperfmetrics.htm Hopefully that gets you a bit more of what you need but does not cover everything. Today's Topics: 1. Zimon metrics details (Lo Re Giuseppe) 2. Re: Zimon metrics details (Kristy Kallback-Rose) 3. Re: Zimon metrics details (Eric Agar) From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 07/02/2018 10:06 AM Subject: Re: [gpfsug-discuss] Zimon metrics details Sent by: gpfsug-discuss-bounces at spectrumscale.org +1 Would love to see more detailed descriptions on Zimon metrics. Sent from my iPhone On Jul 2, 2018, at 6:50 AM, Lo Re Giuseppe wrote: Hi everybody, I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? The SS probelm determination guide doens?t spend more than half a line for each. In particular I would like to understand the difference between these ones: - gpfs_fs_bytes_read - gpfs_fis_bytes_read The second gives tipically higher values than the first one. Thanks for any hit. Regards, Giuseppe -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Wed Jul 4 06:47:28 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 4 Jul 2018 05:47:28 +0000 Subject: [gpfsug-discuss] Filesystem Operation error Message-ID: <254f2811c2b14c9d8c82403d393d0178@SMXRF105.msg.hukrf.de> Hallo All, follow a short story from yesterday on Version 5.0.1.1. We had a 3 - Node cluster (2 Nodes for IO and the third for a quorum Buster function). A Admin make a mistake an take a delete of the 3 Node (VM). We restored ist with a VM Snapshot no Problem. The only point here we lost complete 7 desconly disk. We defined new one and want to delete this disk with mmdeldisk. On 6 Filesystems no problem but one has now a Problem. We delete this disk finaly with mmdeldisk fsname -p. And we see now after a successfully mmdelnsd the old disk already in following display. mmlsdisk tsmconf -L disk driver sector failure holds holds storage name type size group metadata data status availability disk id pool remarks ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- nsd_tsmconf001_DSK20 nsd 512 0 Yes Yes ready up 1 system desc nsd_g4_tsmconf nsd 512 2 No No removing refs down 2 system nsd_tsmconf001_DSK70 nsd 512 1 Yes Yes ready up 3 system desc nsd_g4_tsmconf1 nsd 512 2 No No ready up 4 system desc After that all fs-cmd geneate a fs operation error here like this. Error=MMFS_SYSTEM_UNMOUNT, ID=0xC954F85D, Tag=3882673: Unrecoverable file system operation error. Status code 65536. Volume tsmconf Questions: 1. What does this mean ?removing refs?. Now we don?t have the possibility to handle these disk. The disk itself is no more existend, but in the stripegroup a referenz is available. nsd_g4_tsmconf: uid 0A885085:577BB637, status ReferencesBeingRemoved, availability Unavailable, created on node 10.136.80.133, Tue Jul 5 15:29:27 2016 type 'nsd', sector size 512, failureConfigVersion 424 quorum weight {0,0}, failure group: id 2, fg index 1 locality group: id 2, lg index 1 failureGroupStrP: (2), rackId 2, locationId 0, extLgId 0 nSectors 528384 (0:81000) (258 MB), inode0Sector 131072 alloc region: no of bits 0, seg num -1, offset 0, len 72 suballocator 0x18015B8A7A4 type 0 nBits 32 subSize 0 dataOffset 4 nRows 0 len/off: storage pool: 0 holds nothing sectors past efficient device boundary: 0 isFenced: 1 start Region No: -1 end Region No:-1 start AllocMap Record: -1 2. Are there any cmd to handle these? 3. Where can I find the Status code 65536? A PMR is also open. Any Hints? Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tees at us.ibm.com Wed Jul 4 03:43:28 2018 From: tees at us.ibm.com (Stephen M Tee) Date: Tue, 3 Jul 2018 21:43:28 -0500 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: You dont state whether your running GPFS or ESS and which level. One thing you can check, is whether the SES and enclosure drivers are being loaded. The lsmod command will show if they are. These drivers were found to cause SCSI IO hangs in Linux RH7.3 and 7.4. If they are being loaded, you can blacklist and unload them with no impact to ESS/GNR By default these drivers are blacklisted in ESS. Stephen Tee ESS Storage Development IBM Systems and Technology Austin, TX 512-963-7177 From: Steve Crusan To: gpfsug main discussion list Date: 07/03/2018 05:08 PM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org Kevin, While this is happening, are you able to grab latency stats per LUN (hardware vendor agnostic) to see if there are any outliers? Also, when looking at the mmdiag output, are both reads and writes affected? Depending on the storage hardware, your writes might be hitting cache, so maybe this problem is being exasperated by many small reads (that are too random to be coalesced, take advantage of drive NCQ, etc). The other response about the nsd threads is also a good start, but if the I/O waits shift between different NSD servers and across hardware vendors, my assumption would be that you are hitting a bottleneck somewhere, but what you are seeing is symptoms of I/O backlog, which can manifest at any number of places. This could be something as low level as a few slow drives. Have you just started noticing this behavior? Any new applications on your system? Going by your institution, you're probably supposing a wide variety of codes, so if these problems just started happening, its possible that someone changed their code, or decided to run new scientific packages. -Steve ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: Tuesday, July 03, 2018 11:43 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] High I/O wait times Hi all, not We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Wed Jul 4 13:34:43 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 4 Jul 2018 08:34:43 -0400 (EDT) Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: Hi Kevin, Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > Hi all, > We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. ?One of the > confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from > NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. > > In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. ?In our environment, the most common cause has > been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. ?But that?s *not* happening this time. > Is there anything within GPFS / outside of a hardware issue that I should be looking for?? ?Thanks! > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu?- (615)875-9633 > > > > > From Renar.Grunenberg at huk-coburg.de Thu Jul 5 08:02:36 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 5 Jul 2018 07:02:36 +0000 Subject: [gpfsug-discuss] Filesystem Operation error In-Reply-To: <037a7d7f52bf4a6a83406c8c26fa4d82@SMXRF105.msg.hukrf.de> References: <037a7d7f52bf4a6a83406c8c26fa4d82@SMXRF105.msg.hukrf.de> Message-ID: <8fb424ee10404400ac6b81d985dd5bf9@SMXRF105.msg.hukrf.de> Hallo All, we fixed our Problem here with Spectrum Scale Support. The fixing cmd were ?mmcommon recoverfs tsmconf? and ?tsdeldisk tsmconf -d "nsd_g4_tsmconf". The final reason for this problem, if I want to delete a disk in a filesystem all disk must be reachable from the requesting host. In our config the NSD-Server had no NSD-Server Definitions and the Quorum Buster Node had no access to the SAN attached disk. A Recommendation from my site here are: This should be documented for a high available config with a 3 side implementation, or the cmds that want to update the nsd-descriptors for each disk should check are any disk reachable and don?t do a SG-Panic. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Mittwoch, 4. Juli 2018 07:47 An: 'gpfsug-discuss at spectrumscale.org' Betreff: Filesystem Operation error Hallo All, follow a short story from yesterday on Version 5.0.1.1. We had a 3 - Node cluster (2 Nodes for IO and the third for a quorum Buster function). A Admin make a mistake an take a delete of the 3 Node (VM). We restored ist with a VM Snapshot no Problem. The only point here we lost complete 7 desconly disk. We defined new one and want to delete this disk with mmdeldisk. On 6 Filesystems no problem but one has now a Problem. We delete this disk finaly with mmdeldisk fsname -p. And we see now after a successfully mmdelnsd the old disk already in following display. mmlsdisk tsmconf -L disk driver sector failure holds holds storage name type size group metadata data status availability disk id pool remarks ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- nsd_tsmconf001_DSK20 nsd 512 0 Yes Yes ready up 1 system desc nsd_g4_tsmconf nsd 512 2 No No removing refs down 2 system nsd_tsmconf001_DSK70 nsd 512 1 Yes Yes ready up 3 system desc nsd_g4_tsmconf1 nsd 512 2 No No ready up 4 system desc After that all fs-cmd geneate a fs operation error here like this. Error=MMFS_SYSTEM_UNMOUNT, ID=0xC954F85D, Tag=3882673: Unrecoverable file system operation error. Status code 65536. Volume tsmconf Questions: 1. What does this mean ?removing refs?. Now we don?t have the possibility to handle these disk. The disk itself is no more existend, but in the stripegroup a referenz is available. nsd_g4_tsmconf: uid 0A885085:577BB637, status ReferencesBeingRemoved, availability Unavailable, created on node 10.136.80.133, Tue Jul 5 15:29:27 2016 type 'nsd', sector size 512, failureConfigVersion 424 quorum weight {0,0}, failure group: id 2, fg index 1 locality group: id 2, lg index 1 failureGroupStrP: (2), rackId 2, locationId 0, extLgId 0 nSectors 528384 (0:81000) (258 MB), inode0Sector 131072 alloc region: no of bits 0, seg num -1, offset 0, len 72 suballocator 0x18015B8A7A4 type 0 nBits 32 subSize 0 dataOffset 4 nRows 0 len/off: storage pool: 0 holds nothing sectors past efficient device boundary: 0 isFenced: 1 start Region No: -1 end Region No:-1 start AllocMap Record: -1 2. Are there any cmd to handle these? 3. Where can I find the Status code 65536? A PMR is also open. Any Hints? Regards Renar -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Thu Jul 5 09:28:51 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Thu, 5 Jul 2018 08:28:51 +0000 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events In-Reply-To: References: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> , Message-ID: <83A6EEB0EC738F459A39439733AE804526729376@MBX114.d.ethz.ch> Hello Daniel, I've solved my problem disabling the check (I've gpfs v4.2.3-5) by putting ib_rdma_enable_monitoring=False in the [network] section of the file /var/mmfs/mmsysmon/mmsysmonitor.conf, and restarting the mmsysmonitor. There was a thread in this group about this problem. A ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Yaron Daniel [YARD at il.ibm.com] Sent: Sunday, July 01, 2018 7:17 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Hi There is was issue with Scale 5.x GUI error - ib_rdma_nic_unrecognized(mlx5_0/2) Check if you have the patch: [root at gssio1 ~]# diff /usr/lpp/mmfs/lib/mmsysmon/NetworkService.py /tmp/NetworkService.py 229c229,230 < recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) --- > #recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) > recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+/\d+\n", mmfsadm)) And restart the - mmsysmoncontrol restart Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_0B5B5F080B5B5954005EFD8BC22582BD] Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_06EDAF6406EDA744005EFD8BC22582BD][cid:_1_06EDB16C06EDA744005EFD8BC22582BD] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: "Andrew Beattie" To: gpfsug-discuss at spectrumscale.org Date: 06/28/2018 11:16 AM Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Do you know if there is actually a cable plugged into port 2? The system will work fine as long as there is network connectivity, but you may have an issue with redundancy or loss of bandwidth if you do not have every port cabled and configured correctly. Regards Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Date: Thu, Jun 28, 2018 6:08 PM Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root at sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root at sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root at sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root at sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root at sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.gif Type: image/gif Size: 1851 bytes Desc: ATT00001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00003.gif Type: image/gif Size: 4376 bytes Desc: ATT00003.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00004.gif Type: image/gif Size: 5093 bytes Desc: ATT00004.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00005.gif Type: image/gif Size: 4746 bytes Desc: ATT00005.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00006.gif Type: image/gif Size: 4557 bytes Desc: ATT00006.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00007.gif Type: image/gif Size: 5093 bytes Desc: ATT00007.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00008.jpg Type: image/jpeg Size: 11294 bytes Desc: ATT00008.jpg URL: From michael.holliday at crick.ac.uk Wed Jul 4 12:37:52 2018 From: michael.holliday at crick.ac.uk (Michael Holliday) Date: Wed, 4 Jul 2018 11:37:52 +0000 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: Hi All, Those commands show no errors not do any of the log files. GPFS has started correctly and showing the cluster and all nodes as up and active. We appear to have found the command that is hanging during the mount - However I'm not sure why its hanging. mmwmi mountedfilesystems Michael From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Yaron Daniel Sent: 20 June 2018 16:36 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Windows Mount Also what does mmdiag --network + mmgetstate -a show ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:image001.gif at 01D41393.D1DEB220] Storage Architect - IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:image004.gif at 01D41393.D1DEB220][cid:image005.gif at 01D41393.D1DEB220] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: "Yaron Daniel" > To: gpfsug main discussion list > Date: 06/20/2018 06:31 PM Subject: Re: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ HI Which Windows OS level - which GPFS FS level , what cygwin version ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:image001.gif at 01D41393.D1DEB220] Storage Architect - IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:image004.gif at 01D41393.D1DEB220][cid:image005.gif at 01D41393.D1DEB220][https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Michael Holliday > To: "gpfsug-discuss at spectrumscale.org" > Date: 06/20/2018 05:49 PM Subject: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, We've being trying to get the windows system to mount GPFS. We've set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing - GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 1851 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.gif Type: image/gif Size: 4376 bytes Desc: image002.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.gif Type: image/gif Size: 5093 bytes Desc: image003.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.gif Type: image/gif Size: 4746 bytes Desc: image004.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.gif Type: image/gif Size: 4557 bytes Desc: image005.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.gif Type: image/gif Size: 5093 bytes Desc: image006.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.jpg Type: image/jpeg Size: 11294 bytes Desc: image007.jpg URL: From heiner.billich at psi.ch Thu Jul 5 17:00:08 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 5 Jul 2018 16:00:08 +0000 Subject: [gpfsug-discuss] -o syncnfs has no effect? Message-ID: Hello, I try to mount a fs with "-o syncnfs" as we'll export it with CES/Protocols. But I never see the mount option displayed when I do # mount | grep fs-name This is a remote cluster mount, we'll run the Protocol nodes in a separate cluster. On the home cluster I see the option 'nfssync' in the output of 'mount'. My conclusion is that the mount option "syncnfs" has no effect on remote cluster mounts. Which seems a bit strange? Please can someone clarify on this? What is the impact on protocol nodes exporting remote cluster mounts? Is there any chance of data corruption? Or are some mount options implicitely inherited from the home cluster? I've read 'syncnfs' is default on Linux, but I would like to know for sure. Funny enough I can pass arbitrary options with # mmmount -o some-garbage which are silently ignored. I did 'mmchfs -o syncnfs' on the home cluster and the syncnfs option is present in /etc/fstab on the remote cluster. I did not remount on all nodes __ Thank you, I'll appreciate any hints or replies. Heiner Versions: Remote cluster 5.0.1 on RHEL7.4 (imounts the fs and runs protocol nodes) Home cluster 4.2.3-8 on RHEL6 (export the fs, owns the storage) Filesystem: 17.00 (4.2.3.0) All Linux x86_64 with Spectrum Scale Standard Edition -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From emanners at fsu.edu Thu Jul 5 19:53:36 2018 From: emanners at fsu.edu (Edson Manners) Date: Thu, 5 Jul 2018 14:53:36 -0400 Subject: [gpfsug-discuss] GPFS GUI Message-ID: <756966bc-5287-abf7-6531-4b249b0687e5@fsu.edu> There was another thread on here about the following error in the GUI: Event name: gui_cluster_down Cause: The GUI calculated that an insufficient amount of quorum nodes is up and running. But it looks like the resolution happened in another channel. I have the exact same problem even though we're running a production GPFS cluster that seems to work perfectly fine. This is the last error in the GUI that I'm trying to get solved. What would be the best way to try to troubleshoot this. -- [Any errors in spelling, tact or fact are transmission errors] - (Stolen from) Dag Wieers Edson Manners Research Computing Center FSU Information Technology Services Dirac Science Library., Room 150G Tallahassee, Florida 32306-4120 From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 6 02:11:17 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 6 Jul 2018 01:11:17 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue. While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week. You?re correct about our mixed workload. There have been no new workloads that I am aware of. Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the servers and storage. We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks. Linux multipathing handles path failures. 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time). So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array. As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output. We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity. Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized. And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB. That has not made a difference. How can I determine how much of the pagepool is actually being used, BTW? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns. The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way. The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now. If you have read this entire very long e-mail, first off, thank you! If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the >> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from >> NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has >> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 From andreas.koeninger at de.ibm.com Fri Jul 6 07:38:07 2018 From: andreas.koeninger at de.ibm.com (Andreas Koeninger) Date: Fri, 6 Jul 2018 06:38:07 +0000 Subject: [gpfsug-discuss] GPFS GUI In-Reply-To: <756966bc-5287-abf7-6531-4b249b0687e5@fsu.edu> References: <756966bc-5287-abf7-6531-4b249b0687e5@fsu.edu> Message-ID: An HTML attachment was scrubbed... URL: From jjdoherty at yahoo.com Fri Jul 6 14:02:38 2018 From: jjdoherty at yahoo.com (Jim Doherty) Date: Fri, 6 Jul 2018 13:02:38 +0000 (UTC) Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> Message-ID: <733478365.61492.1530882158667@mail.yahoo.com> You may want to get an mmtrace,? but I suspect that the disk IOs are slow.???? The iohist is showing the time from when the start IO was issued until it was finished.??? Of course if you have disk IOs taking 10x too long then other IOs are going to queue up behind it.??? If there are more IOs than there are NSD server threads then there are going to be IOs that are queued and waiting for a thread. Jim On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L wrote: Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue.? While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week.? You?re correct about our mixed workload.? There have been no new workloads that I am aware of. Stephen - no, this is not an ESS.? We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN.? Commodity hardware for the servers and storage.? We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks.? Linux multipathing handles path failures.? 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time).? So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array.? As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output.? We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. ??? 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. ??? 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. ??? 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. ??? 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. ??? 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. ??? 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. ??? 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. ??? 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. ??? 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. ??? 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. ??? 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. ??? 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. ??? 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. ??? 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. ??? 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. ??? 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity.? Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized.? And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB.? That has not made a difference.? How can I determine how much of the pagepool is actually being used, BTW?? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns.? The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way.? The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now.? If you have read this entire very long e-mail, first off, thank you!? If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why.? One of the >> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from >> NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related.? In our environment, the most common cause has >> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk.? But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for??? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From emanners at fsu.edu Fri Jul 6 14:05:32 2018 From: emanners at fsu.edu (Edson Manners) Date: Fri, 6 Jul 2018 13:05:32 +0000 Subject: [gpfsug-discuss] GPFS GUI Message-ID: Ok. I'm on 4.2.3-5. So would this bug still show up if my remote filesystem is mounted? Because it is. Thanks. On 7/6/2018 2:38:21 AM, Andreas Koeninger wrote: Which version are you using? There was a bug in 4.2.3.6 and before related to unmounted remote filesystems which could lead to a gui_cluster_down event on the local cluster. Mit freundlichen Gr??en / Kind regards Andreas Koeninger Scrum Master and Software Developer / Spectrum Scale GUI and REST API IBM Systems &Technology Group, Integrated Systems Development / M069 ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49-7034-643-0867 Mobile: +49-7034-643-0867 E-Mail: andreas.koeninger at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Original message ----- From: Edson Manners Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [gpfsug-discuss] GPFS GUI Date: Thu, Jul 5, 2018 11:38 PM There was another thread on here about the following error in the GUI: Event name: gui_cluster_down Cause: The GUI calculated that an insufficient amount of quorum nodes is up and running. But it looks like the resolution happened in another channel. I have the exact same problem even though we're running a production GPFS cluster that seems to work perfectly fine. This is the last error in the GUI that I'm trying to get solved. What would be the best way to try to troubleshoot this. -- [Any errors in spelling, tact or fact are transmission errors] - (Stolen from) Dag Wieers Edson Manners Research Computing Center FSU Information Technology Services Dirac Science Library., Room 150G Tallahassee, Florida 32306-4120 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.koeninger at de.ibm.com Fri Jul 6 14:31:32 2018 From: andreas.koeninger at de.ibm.com (Andreas Koeninger) Date: Fri, 6 Jul 2018 13:31:32 +0000 Subject: [gpfsug-discuss] GPFS GUI In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 6 15:27:51 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 6 Jul 2018 14:27:51 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <733478365.61492.1530882158667@mail.yahoo.com> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> Message-ID: Hi Jim, Thank you for your response. We are taking a two-pronged approach at this point: 1. While I don?t see anything wrong with our storage arrays, I have opened a ticket with the vendor (not IBM) to get them to look at things from that angle. 2. Since the problem moves around from time to time, we are enhancing our monitoring script to see if we can basically go from ?mmdiag ?iohist? to ?clients issuing those I/O requests? to ?jobs running on those clients? to see if there is any commonality there. Thanks again - much appreciated! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 6, 2018, at 8:02 AM, Jim Doherty > wrote: You may want to get an mmtrace, but I suspect that the disk IOs are slow. The iohist is showing the time from when the start IO was issued until it was finished. Of course if you have disk IOs taking 10x too long then other IOs are going to queue up behind it. If there are more IOs than there are NSD server threads then there are going to be IOs that are queued and waiting for a thread. Jim On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L > wrote: Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue. While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week. You?re correct about our mixed workload. There have been no new workloads that I am aware of. Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the servers and storage. We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks. Linux multipathing handles path failures. 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time). So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array. As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output. We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity. Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized. And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB. That has not made a difference. How can I determine how much of the pagepool is actually being used, BTW? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns. The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way. The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now. If you have read this entire very long e-mail, first off, thank you! If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister > wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the >> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from >> NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has >> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C331014fd459d4151432308d5e340c4fa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664789687076842&sdata=UhjNipQdsNjxIcUB%2Ffu2qEwn7K6tIBmGWEIruxGgI4A%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Fri Jul 6 18:13:26 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Fri, 6 Jul 2018 10:13:26 -0700 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> Message-ID: Hi Kevin, This is a bit of a "cargo cult" suggestion but one issue that I have seen is if a disk starts misbehaving a bit but does not fail, it slows down the whole raid group that it is in. And the only way to detect it is to examine the read/write latencies on the individual disks. Does your SAN allow you to do that? That happened to me at least twice in my life and replacing the offending individual disk solved the issue. This was on DDN, so the relevant command were something like 'show pd * counters write_lat' or similar, which showed the latency for the I/Os for each disk. If one disk in the group is an outlier (e.g. 1s write latencies), then the whole raid array (LUN) is just waiting for that one disk. Another possibility for troubleshooting, if you have sufficient free resources: you can just suspend the problematic LUNs in GPFS, as that will remove the write load from them, while still having them service read requests and not affecting users. Regards, Alex On Fri, Jul 6, 2018 at 9:11 AM Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu> wrote: > Hi Jim, > > Thank you for your response. We are taking a two-pronged approach at this > point: > > 1. While I don?t see anything wrong with our storage arrays, I have > opened a ticket with the vendor (not IBM) to get them to look at things > from that angle. > > 2. Since the problem moves around from time to time, we are enhancing our > monitoring script to see if we can basically go from ?mmdiag ?iohist? to > ?clients issuing those I/O requests? to ?jobs running on those clients? to > see if there is any commonality there. > > Thanks again - much appreciated! > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > On Jul 6, 2018, at 8:02 AM, Jim Doherty wrote: > > You may want to get an mmtrace, but I suspect that the disk IOs are > slow. The iohist is showing the time from when the start IO was issued > until it was finished. Of course if you have disk IOs taking 10x too > long then other IOs are going to queue up behind it. If there are more > IOs than there are NSD server threads then there are going to be IOs that > are queued and waiting for a thread. > > Jim > > > On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L < > Kevin.Buterbaugh at Vanderbilt.Edu> wrote: > > > Hi All, > > First off, my apologies for the delay in responding back to the list ? > we?ve actually been working our tails off on this one trying to collect as > much data as we can on what is a very weird issue. While I?m responding to > Aaron?s e-mail, I?m going to try to address the questions raised in all the > responses. > > Steve - this all started last week. You?re correct about our mixed > workload. There have been no new workloads that I am aware of. > > Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. > > Aaron - no, this is not on a DDN, either. > > The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the > servers and storage. We have two SAN ?stacks? and all NSD servers and > storage are connected to both stacks. Linux multipathing handles path > failures. 10 GbE out to the network. > > We first were alerted to this problem by one of our monitoring scripts > which was designed to alert us to abnormally high I/O times, which, as I > mentioned previously, in our environment has usually been caused by cache > battery backup failures in the storage array controllers (but _not_ this > time). So I?m getting e-mails that in part read: > > Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. > Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. > > The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN > on that storage array. As I?ve mentioned, those two LUNs are by far and > away my most frequent problem children, but here?s another report from > today as well: > > Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. > Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. > Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. > Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. > > NSD server hostnames have been changed, BTW, from their real names to nsd1 > - 8. > > Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm > dump nsd? output. We wrote a Python script to pull out what we think is > the most pertinent information: > > nsd1 > 29 SMALL queues, 50 requests pending, 3741 was the highest number of > requests pending. > 348 threads started, 1 threads active, 348 was the highest number of > threads active. > 29 LARGE queues, 0 requests pending, 5694 was the highest number of > requests pending. > 348 threads started, 124 threads active, 348 was the highest number of > threads active. > nsd2 > 29 SMALL queues, 0 requests pending, 1246 was the highest number of > requests pending. > 348 threads started, 13 threads active, 348 was the highest number of > threads active. > 29 LARGE queues, 470 requests pending, 2404 was the highest number of > requests pending. > 348 threads started, 340 threads active, 348 was the highest number of > threads active. > nsd3 > 29 SMALL queues, 108 requests pending, 1796 was the highest number of > requests pending. > 348 threads started, 0 threads active, 348 was the highest number of > threads active. > 29 LARGE queues, 35 requests pending, 3331 was the highest number of > requests pending. > 348 threads started, 4 threads active, 348 was the highest number of > threads active. > nsd4 > 42 SMALL queues, 0 requests pending, 1529 was the highest number of > requests pending. > 504 threads started, 8 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 0 requests pending, 637 was the highest number of > requests pending. > 504 threads started, 211 threads active, 504 was the highest number of > threads active. > nsd5 > 42 SMALL queues, 182 requests pending, 2798 was the highest number of > requests pending. > 504 threads started, 6 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 407 requests pending, 4416 was the highest number of > requests pending. > 504 threads started, 8 threads active, 504 was the highest number of > threads active. > nsd6 > 42 SMALL queues, 0 requests pending, 1630 was the highest number of > requests pending. > 504 threads started, 0 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 0 requests pending, 148 was the highest number of > requests pending. > 504 threads started, 9 threads active, 504 was the highest number of > threads active. > nsd7 > 42 SMALL queues, 43 requests pending, 2179 was the highest number of > requests pending. > 504 threads started, 1 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 0 requests pending, 2551 was the highest number of > requests pending. > 504 threads started, 13 threads active, 504 was the highest number of > threads active. > nsd8 > 42 SMALL queues, 0 requests pending, 1014 was the highest number of > requests pending. > 504 threads started, 4 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 0 requests pending, 3371 was the highest number of > requests pending. > 504 threads started, 89 threads active, 504 was the highest number of > threads active. > > Note that we see more ?load? on the LARGE queue side of things and that > nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most > frequently in our alerts) are the heaviest loaded. > > One other thing we have noted is that our home grown RRDtool monitoring > plots that are based on netstat, iostat, vmstat, etc. also show an oddity. > Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 > (there are 4 in total) show up as 93 - 97% utilized. And another oddity > there is that eon34A and eon34B rarely show up on the alert e-mails, while > eon34C and eon34E show up waaaayyyyyyy more than anything else ? the > difference between them is that A and B are on the storage array itself and > C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve > actually checked and reseated those connections). > > Another reason why I could not respond earlier today is that one of the > things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 > from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool > on those two boxes to 40 GB. That has not made a difference. How can I > determine how much of the pagepool is actually being used, BTW? A quick > Google search didn?t help me. > > So we?re trying to figure out if we have storage hardware issues causing > GPFS issues or GPFS issues causing storage slowdowns. The fact that I see > slowdowns most often on one storage array points in one direction, while > the fact that at times I see even worse slowdowns on multiple other arrays > points the other way. The fact that some NSD servers show better stats > than others in the analysis of the ?mmfsadm dump nsd? output tells me ? > well, I don?t know what it tells me. > > I think that?s all for now. If you have read this entire very long > e-mail, first off, thank you! If you?ve read it and have ideas for where I > should go from here, T-H-A-N-K Y-O-U! > > Kevin > > > On Jul 4, 2018, at 7:34 AM, Aaron Knister > wrote: > > > > Hi Kevin, > > > > Just going out on a very weird limb here...but you're not by chance > seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. > SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high > latency on some of our SFA12ks (that have otherwise been solid both in > terms of stability and performance) but only on certain volumes and the > affected volumes change. It's very bizzarre and we've been working closely > with DDN to track down the root cause but we've not yet found a smoking > gun. The timing and description of your problem sounded eerily similar to > what we're seeing so I'd thought I'd ask. > > > > -Aaron > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > > > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > > > >> Hi all, > >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some > of our NSDs as reported by ?mmdiag ?iohist" and are struggling to > understand why. One of the > >> confusing things is that, while certain NSDs tend to show the problem > more than others, the problem is not consistent ? i.e. the problem tends to > move around from > >> NSD to NSD (and storage array to storage array) whenever we check ? > which is sometimes just a few minutes apart. > >> In the past when I have seen ?mmdiag ?iohist? report high wait times > like this it has *always* been hardware related. In our environment, the > most common cause has > >> been a battery backup unit on a storage array controller going bad and > the storage array switching to write straight to disk. But that?s *not* > happening this time. > >> Is there anything within GPFS / outside of a hardware issue that I > should be looking for?? Thanks! > >> ? > >> Kevin Buterbaugh - Senior System Administrator > >> Vanderbilt University - Advanced Computing Center for Research and > Education > >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C331014fd459d4151432308d5e340c4fa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664789687076842&sdata=UhjNipQdsNjxIcUB%2Ffu2qEwn7K6tIBmGWEIruxGgI4A%3D&reserved=0 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Jul 6 22:03:09 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 6 Jul 2018 22:03:09 +0100 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> Message-ID: <865c5f52-fa62-571f-aeef-9b1073dfa156@strath.ac.uk> On 06/07/18 02:11, Buterbaugh, Kevin L wrote: [SNIP] > > The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for > the servers and storage. We have two SAN ?stacks? and all NSD > servers and storage are connected to both stacks. Linux multipathing > handles path failures. 10 GbE out to the network. You don't mention it, but have you investigated your FC fabric? Dodgy laser, bad photodiode or damaged fibre can cause havoc. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Kevin.Buterbaugh at Vanderbilt.Edu Sat Jul 7 01:28:06 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Sat, 7 Jul 2018 00:28:06 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> Message-ID: <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> Hi All, Another update on this issue as we have made significant progress today ? but first let me address the two responses I received. Alex - this is a good idea and yes, we did this today. We did see some higher latencies on one storage array as compared to the others. 10-20 ms on the ?good? storage arrays ? 50-60 ms on the one storage array. It took us a while to be able to do this because while the vendor provides a web management interface, that didn?t show this information. But they have an actual app that will ? and the Mac and Linux versions don?t work. So we had to go scrounge up this thing called a Windows PC and get the software installed there. ;-) Jonathan - also a good idea and yes, we also did this today. I?ll explain as part of the rest of this update. The main thing that we did today that has turned out to be most revealing is to take a list of all the NSDs in the impacted storage pool ? 19 devices spread out over 7 storage arrays ? and run read dd tests on all of them (the /dev/dm-XX multipath device). 15 of them showed rates of 33 - 100+ MB/sec and the variation is almost definitely explained by the fact that they?re in production use and getting hit by varying amounts of ?real? work. But 4 of them showed rates of 2-10 MB/sec and those 4 all happen to be on storage array eon34. So, to try to rule out everything but the storage array we replaced the FC cables going from the SAN switches to the array, plugging the new cables into different ports on the SAN switches. Then we repeated the dd tests from a different NSD server, which both eliminated the NSD server and its? FC cables as a potential cause ? and saw results virtually identical to the previous test. Therefore, we feel pretty confident that it is the storage array and have let the vendor know all of this. And there?s another piece of quite possibly relevant info ? the last week in May one of the controllers in this array crashed and rebooted (it?s a active-active dual controller array) ? when that happened the failover occurred ? with a major glitch. One of the LUNs essentially disappeared ? more accurately, it was there, but had no size! We?ve been using this particular vendor for 15 years now and I have seen more than a couple of their controllers go bad during that time and nothing like this had ever happened before. They were never able to adequately explain what happened there. So what I am personally suspecting has happened is that whatever caused that one LUN to go MIA has caused these issues with the other LUNs on the array. As an aside, we ended up using mmfileid to identify the files that had blocks on the MIA LUN and restored those from tape backup. I want to thank everyone who has offered their suggestions so far. I will update the list again once we have a definitive problem determination. I hope that everyone has a great weekend. In the immortal words of the wisest man who ever lived, ?I?m kinda tired ? think I?ll go home now.? ;-) Kevin On Jul 6, 2018, at 12:13 PM, Alex Chekholko > wrote: Hi Kevin, This is a bit of a "cargo cult" suggestion but one issue that I have seen is if a disk starts misbehaving a bit but does not fail, it slows down the whole raid group that it is in. And the only way to detect it is to examine the read/write latencies on the individual disks. Does your SAN allow you to do that? That happened to me at least twice in my life and replacing the offending individual disk solved the issue. This was on DDN, so the relevant command were something like 'show pd * counters write_lat' or similar, which showed the latency for the I/Os for each disk. If one disk in the group is an outlier (e.g. 1s write latencies), then the whole raid array (LUN) is just waiting for that one disk. Another possibility for troubleshooting, if you have sufficient free resources: you can just suspend the problematic LUNs in GPFS, as that will remove the write load from them, while still having them service read requests and not affecting users. Regards, Alex On Fri, Jul 6, 2018 at 9:11 AM Buterbaugh, Kevin L > wrote: Hi Jim, Thank you for your response. We are taking a two-pronged approach at this point: 1. While I don?t see anything wrong with our storage arrays, I have opened a ticket with the vendor (not IBM) to get them to look at things from that angle. 2. Since the problem moves around from time to time, we are enhancing our monitoring script to see if we can basically go from ?mmdiag ?iohist? to ?clients issuing those I/O requests? to ?jobs running on those clients? to see if there is any commonality there. Thanks again - much appreciated! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 6, 2018, at 8:02 AM, Jim Doherty > wrote: You may want to get an mmtrace, but I suspect that the disk IOs are slow. The iohist is showing the time from when the start IO was issued until it was finished. Of course if you have disk IOs taking 10x too long then other IOs are going to queue up behind it. If there are more IOs than there are NSD server threads then there are going to be IOs that are queued and waiting for a thread. Jim On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L > wrote: Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue. While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week. You?re correct about our mixed workload. There have been no new workloads that I am aware of. Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the servers and storage. We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks. Linux multipathing handles path failures. 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time). So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array. As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output. We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity. Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized. And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB. That has not made a difference. How can I determine how much of the pagepool is actually being used, BTW? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns. The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way. The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now. If you have read this entire very long e-mail, first off, thank you! If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister > wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the >> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from >> NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has >> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C331014fd459d4151432308d5e340c4fa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664789687076842&sdata=UhjNipQdsNjxIcUB%2Ffu2qEwn7K6tIBmGWEIruxGgI4A%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Caa277914313f445d702e08d5e363d347%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664940252877301&sdata=bnjsWHwutbbKstghBrB5Y7%2FIzeX7U19vroW%2B0xA2gX8%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Jul 7 09:42:57 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 7 Jul 2018 09:42:57 +0100 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> Message-ID: <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> On 07/07/18 01:28, Buterbaugh, Kevin L wrote: [SNIP] > > So, to try to rule out everything but the storage array we replaced the > FC cables going from the SAN switches to the array, plugging the new > cables into different ports on the SAN switches. ?Then we repeated the > dd tests from a different NSD server, which both eliminated the NSD > server and its? FC cables as a potential cause ? and saw results > virtually identical to the previous test. ?Therefore, we feel pretty > confident that it is the storage array and have let the vendor know all > of this. I was not thinking of doing anything quite as drastic as replacing stuff, more look into the logs on the switches in the FC network and examine them for packet errors. The above testing didn't eliminate bad optics in the storage array itself for example, though it does appear to be the storage arrays themselves. Sounds like they could do with a power cycle... JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Cameron.Dunn at bristol.ac.uk Fri Jul 6 17:36:14 2018 From: Cameron.Dunn at bristol.ac.uk (Cameron Dunn) Date: Fri, 6 Jul 2018 16:36:14 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: , Message-ID: Thanks Christof, we had left out "gpfs" from the vfs objects = line in smb.conf so setting vfs objects = gpfs (etc) gpfs:hsm = yes gpfs:recalls = yes (not "no" as I had originally, and is implied by the manual) and setting the offline flag on the file by migrating it, so that # mmlsattr -L filename.jpg ... Misc attributes: ARCHIVE OFFLINE now Explorer on Windows 7 and 10 do not recall the file while viewing the folder with "Large icons" and a standard icon with an X is displayed. But after the file is then opened and recalled, the icon displays the thumbnail image and the OFFLINE flag is lost. Also as you observed, Finder on MacOSX 10.13 ignores the file's offline flag, so we still risk a recall storm caused by them. All the best, Cameron ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Christof Schmitt Sent: 03 July 2018 20:37:08 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms > HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape > are then shared by Samba to Macs and PCs. > MacOS Finder and Windows Explorer will want to display all the thumbnail images of a > folder's contents, which will recall lots of files from tape. SMB clients can query file information, including the OFFLINE flag. With Spectrum Scale and the "gpfs" module loaded in Samba that is mapped from the the OFFLINE flag that is visible in "mmlsattr -L". In those systems, the SMB client can determine that a file is offline. In our experience this is handled correctly in Windows Explorer; when an "offline" file is encountered, no preview is generated from the file data. The Finder on Mac clients does not seem to honor the OFFLINE flag, thus the main problems are typically recall storms caused by Mac clients. > According to the Samba documentation this is preventable by setting the following > ---------------------------------------------- > https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html > > gpfs:recalls = [ yes | no ] > When this option is set to no, an attempt to open an offline file > will be rejected with access denied. > This helps preventing recall storms triggered by careless applications like Finder and Explorer. > > yes(default) - Open files that are offline. This will recall the files from HSM. > no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. > Using this setting also requires gpfs:hsm to be set to yes. > > gpfs:hsm = [ yes | no ] > Enable/Disable announcing if this FS has HSM enabled. > no(default) - Do not announce HSM. > yes - Announce HSM. > -------------------------------------------------- > > However we could not get this to work. > > On Centos7/Samba4.5, smb.conf contained > gpfs:hsm = yes > gpfs:recalls = no > (also tried setting gpfs:offline = yes, though this is not documented) These options apply to the "gpfs" module in Samba. The Samba version you are using needs to be built with GPFS support and the "gpfs" module needs to be loaded through the "vfs objects" configuration. As Centos7/Samba4.5 is mentioned, would guess that the CentOS provided Samba version is used, which is probably not compiled with GPFS support. >From IBM we would recommend to use CES for protocol services, which also provides Samba for SMB. The Samba provided through CES is configured so that the gpfs:recalls option can be used: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmsmb.htm gpfs:recalls If the value is set as yes files that have been migrated from disk will be recalled on access. By default, this is enabled. If recalls = no files will not be recalled on access and the client will receive ACCESS_DENIED message. > We made a share containing image files that were then migrated to tape by LTFS-EE, > to see if these flags were respected by OS X Finder or Windows Explorer. > > Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, > so that when browsing the stubs in the share, the files were recalled from tape > and the thumbnails displayed. > > Has anyone seen these flags working as they are supposed to ? Yes, they are working, as we use them in our Samba build. Debugging this would require looking at the Samba configuration and possibly collecting a trace. If my above assumption was wrong and this problem occurs with the CES Samba (gpfs.smb), please open a PMR for debugging this issue. If this is not the CES Samba, please contact the provider of the Samba package for additional support. Regards, Christof Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: Cameron Dunn Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] preventing HSM tape recall storms Date: Tue, Jul 3, 2018 6:22 AM HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape are then shared by Samba to Macs and PCs. MacOS Finder and Windows Explorer will want to display all the thumbnail images of a folder's contents, which will recall lots of files from tape. According to the Samba documentation this is preventable by setting the following ---------------------------------------------- https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html gpfs:recalls = [ yes | no ] When this option is set to no, an attempt to open an offline file will be rejected with access denied. This helps preventing recall storms triggered by careless applications like Finder and Explorer. yes(default) - Open files that are offline. This will recall the files from HSM. no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. Using this setting also requires gpfs:hsm to be set to yes. gpfs:hsm = [ yes | no ] Enable/Disable announcing if this FS has HSM enabled. no(default) - Do not announce HSM. yes - Announce HSM. -------------------------------------------------- However we could not get this to work. On Centos7/Samba4.5, smb.conf contained gpfs:hsm = yes gpfs:recalls = no (also tried setting gpfs:offline = yes, though this is not documented) We made a share containing image files that were then migrated to tape by LTFS-EE, to see if these flags were respected by OS X Finder or Windows Explorer. Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, so that when browsing the stubs in the share, the files were recalled from tape and the thumbnails displayed. Has anyone seen these flags working as they are supposed to ? Many thanks for any ideas, Cameron Cameron Dunn Advanced Computing Systems Administrator Advanced Computing Research Centre University of Bristol _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From YARD at il.ibm.com Sun Jul 8 18:32:25 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Sun, 8 Jul 2018 20:32:25 +0300 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu><397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu><733478365.61492.1530882158667@mail.yahoo.com><1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> Message-ID: Hi Clean all counters on the FC switches and see which port have errors . For brocade run : slotstatsclear statsclear porterrshow For cisco run: clear countersall There might be bad gbic/cable/Storage gbic, which can affect the performance, if there is something like that - u can see which ports have errors grow over time. Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Jonathan Buzzard To: gpfsug-discuss at spectrumscale.org Date: 07/07/2018 11:43 AM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org On 07/07/18 01:28, Buterbaugh, Kevin L wrote: [SNIP] > > So, to try to rule out everything but the storage array we replaced the > FC cables going from the SAN switches to the array, plugging the new > cables into different ports on the SAN switches. Then we repeated the > dd tests from a different NSD server, which both eliminated the NSD > server and its? FC cables as a potential cause ? and saw results > virtually identical to the previous test. Therefore, we feel pretty > confident that it is the storage array and have let the vendor know all > of this. I was not thinking of doing anything quite as drastic as replacing stuff, more look into the logs on the switches in the FC network and examine them for packet errors. The above testing didn't eliminate bad optics in the storage array itself for example, though it does appear to be the storage arrays themselves. Sounds like they could do with a power cycle... JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=TM-kJsvzTX9cq_xmR5ITHclBCfO4FDvZ3ZxyugfJCfQ&s=Ass164qVEhb9fC4_VCmzfZeYd_BLOv9cZsfkrzqi8pM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From chris.schlipalius at pawsey.org.au Mon Jul 9 01:36:01 2018 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Mon, 09 Jul 2018 08:36:01 +0800 Subject: [gpfsug-discuss] Upcoming meeting: Australian Spectrum Scale Usergroup 10th August 2018 Sydney Message-ID: <2BD2D9AA-774D-4D6E-A2E6-069E7E91F40E@pawsey.org.au> Dear members, Please note the next Australian Usergroup is confirmed. If you plan to attend, please register: http://bit.ly/2NiNFEQ Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 13 Burvill Court Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10708 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Mon Jul 9 09:51:25 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 9 Jul 2018 08:51:25 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> Message-ID: Did you upgrade the memory etc purely as a "maybe this will help" fix? If so, and it didn't help, I'd be tempted to reduce it again as you may introduce another problem into the environment. I wonder if your disks are about to die, although I suspect you'd have already been forewarned of errors from the disk(s) via the storage system. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 06 July 2018 02:11 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] High I/O wait times Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue. While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week. You?re correct about our mixed workload. There have been no new workloads that I am aware of. Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the servers and storage. We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks. Linux multipathing handles path failures. 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time). So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array. As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output. We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity. Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized. And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB. That has not made a difference. How can I determine how much of the pagepool is actually being used, BTW? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns. The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way. The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now. If you have read this entire very long e-mail, first off, thank you! If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight > Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on >> some of our NSDs as reported by ?mmdiag ?iohist" and are struggling >> to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times >> like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator Vanderbilt University >> - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug > .org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterb > augh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3b > e4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D% > 2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From scale at us.ibm.com Mon Jul 9 17:57:18 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 9 Jul 2018 09:57:18 -0700 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: Hello, Can you provide the Windows OS and GPFS versions. Does the mmmount hang indefinitely or for a finite time (like 30 seconds or so)? Do you see any GPFS waiters during the mmmount hang? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michael Holliday To: gpfsug main discussion list Date: 07/05/2018 08:12 AM Subject: Re: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Those commands show no errors not do any of the log files. GPFS has started correctly and showing the cluster and all nodes as up and active. We appear to have found the command that is hanging during the mount - However I?m not sure why its hanging. mmwmi mountedfilesystems Michael From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Yaron Daniel Sent: 20 June 2018 16:36 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Windows Mount Also what does mmdiag --network + mmgetstate -a show ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Yaron Daniel" To: gpfsug main discussion list Date: 06/20/2018 06:31 PM Subject: Re: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org HI Which Windows OS level - which GPFS FS level , what cygwin version ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Michael Holliday To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 06/20/2018 05:49 PM Subject: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We?ve being trying to get the windows system to mount GPFS. We?ve set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing ? GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From christof.schmitt at us.ibm.com Mon Jul 9 19:53:36 2018 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Mon, 9 Jul 2018 18:53:36 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: , , Message-ID: An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Jul 9 19:57:38 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 9 Jul 2018 14:57:38 -0400 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: , , Message-ID: Another option is to request Apple to support the OFFLINE flag in the SMB protocol. The more Mac customers making such a request (I have asked others to do likewise) might convince Apple to add this checking to their SMB client. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Christof Schmitt" To: gpfsug-discuss at spectrumscale.org Date: 07/09/2018 02:53 PM Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms Sent by: gpfsug-discuss-bounces at spectrumscale.org > we had left out "gpfs" from the > vfs objects = > line in smb.conf > > so setting > vfs objects = gpfs (etc) > gpfs:hsm = yes > gpfs:recalls = yes (not "no" as I had originally, and is implied by the manual) Thank you for the update. gpfs:recalls=yes is the default, allowing recalls of files. If you set that to 'no', Samba will deny access to "OFFLINE" files in GPFS through SMB. > and setting the offline flag on the file by migrating it, so that > # mmlsattr -L filename.jpg > ... > Misc attributes: ARCHIVE OFFLINE > > now Explorer on Windows 7 and 10 do not recall the file while viewing the folder with "Large icons" > > and a standard icon with an X is displayed. > > But after the file is then opened and recalled, the icon displays the thumbnail image and the OFFLINE flag is lost. Yes, that is working as intended. While the file is only in the "external pool" (e.g. HSM tape), the OFFLINE flag is reported. Once you read/write data, that triggers a recall to the disk pool and the flag is cleared. > Also as you observed, Finder on MacOSX 10.13 ignores the file's offline flag, > > so we still risk a recall storm caused by them. The question here would be how to handle the Mac clients. You could configured two SMB shares on the same path: One with gpfs:recalls=yes and tell the Windows users to access that share; the other one with gpfs:recalls=no and tell the Mac users to use that share. That would avoid the recall storms, but runs the risk of Mac users connecting to the wrong share and avoiding this workaround... Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: Cameron Dunn Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms Date: Sat, Jul 7, 2018 2:30 PM Thanks Christof, we had left out "gpfs" from the vfs objects = line in smb.conf so setting vfs objects = gpfs (etc) gpfs:hsm = yes gpfs:recalls = yes (not "no" as I had originally, and is implied by the manual) and setting the offline flag on the file by migrating it, so that # mmlsattr -L filename.jpg ... Misc attributes: ARCHIVE OFFLINE now Explorer on Windows 7 and 10 do not recall the file while viewing the folder with "Large icons" and a standard icon with an X is displayed. But after the file is then opened and recalled, the icon displays the thumbnail image and the OFFLINE flag is lost. Also as you observed, Finder on MacOSX 10.13 ignores the file's offline flag, so we still risk a recall storm caused by them. All the best, Cameron From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Christof Schmitt Sent: 03 July 2018 20:37:08 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms > HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape > are then shared by Samba to Macs and PCs. > MacOS Finder and Windows Explorer will want to display all the thumbnail images of a > folder's contents, which will recall lots of files from tape. SMB clients can query file information, including the OFFLINE flag. With Spectrum Scale and the "gpfs" module loaded in Samba that is mapped from the the OFFLINE flag that is visible in "mmlsattr -L". In those systems, the SMB client can determine that a file is offline. In our experience this is handled correctly in Windows Explorer; when an "offline" file is encountered, no preview is generated from the file data. The Finder on Mac clients does not seem to honor the OFFLINE flag, thus the main problems are typically recall storms caused by Mac clients. > According to the Samba documentation this is preventable by setting the following > ---------------------------------------------- > https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html > > gpfs:recalls = [ yes | no ] > When this option is set to no, an attempt to open an offline file > will be rejected with access denied. > This helps preventing recall storms triggered by careless applications like Finder and Explorer. > > yes(default) - Open files that are offline. This will recall the files from HSM. > no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. > Using this setting also requires gpfs:hsm to be set to yes. > > gpfs:hsm = [ yes | no ] > Enable/Disable announcing if this FS has HSM enabled. > no(default) - Do not announce HSM. > yes - Announce HSM. > -------------------------------------------------- > > However we could not get this to work. > > On Centos7/Samba4.5, smb.conf contained > gpfs:hsm = yes > gpfs:recalls = no > (also tried setting gpfs:offline = yes, though this is not documented) These options apply to the "gpfs" module in Samba. The Samba version you are using needs to be built with GPFS support and the "gpfs" module needs to be loaded through the "vfs objects" configuration. As Centos7/Samba4.5 is mentioned, would guess that the CentOS provided Samba version is used, which is probably not compiled with GPFS support. >From IBM we would recommend to use CES for protocol services, which also provides Samba for SMB. The Samba provided through CES is configured so that the gpfs:recalls option can be used: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmsmb.htm gpfs:recalls If the value is set as yes files that have been migrated from disk will be recalled on access. By default, this is enabled. If recalls = no files will not be recalled on access and the client will receive ACCESS_DENIED message. > We made a share containing image files that were then migrated to tape by LTFS-EE, > to see if these flags were respected by OS X Finder or Windows Explorer. > > Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, > so that when browsing the stubs in the share, the files were recalled from tape > and the thumbnails displayed. > > Has anyone seen these flags working as they are supposed to ? Yes, they are working, as we use them in our Samba build. Debugging this would require looking at the Samba configuration and possibly collecting a trace. If my above assumption was wrong and this problem occurs with the CES Samba (gpfs.smb), please open a PMR for debugging this issue. If this is not the CES Samba, please contact the provider of the Samba package for additional support. Regards, Christof Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: Cameron Dunn Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] preventing HSM tape recall storms Date: Tue, Jul 3, 2018 6:22 AM HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape are then shared by Samba to Macs and PCs. MacOS Finder and Windows Explorer will want to display all the thumbnail images of a folder's contents, which will recall lots of files from tape. According to the Samba documentation this is preventable by setting the following ---------------------------------------------- https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html gpfs:recalls = [ yes | no ] When this option is set to no, an attempt to open an offline file will be rejected with access denied. This helps preventing recall storms triggered by careless applications like Finder and Explorer. yes(default) - Open files that are offline. This will recall the files from HSM. no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. Using this setting also requires gpfs:hsm to be set to yes. gpfs:hsm = [ yes | no ] Enable/Disable announcing if this FS has HSM enabled. no(default) - Do not announce HSM. yes - Announce HSM. -------------------------------------------------- However we could not get this to work. On Centos7/Samba4.5, smb.conf contained gpfs:hsm = yes gpfs:recalls = no (also tried setting gpfs:offline = yes, though this is not documented) We made a share containing image files that were then migrated to tape by LTFS-EE, to see if these flags were respected by OS X Finder or Windows Explorer. Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, so that when browsing the stubs in the share, the files were recalled from tape and the thumbnails displayed. Has anyone seen these flags working as they are supposed to ? Many thanks for any ideas, Cameron Cameron Dunn Advanced Computing Systems Administrator Advanced Computing Research Centre University of Bristol _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 9 20:31:32 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 9 Jul 2018 19:31:32 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? Message-ID: <4DFFEEC3-3AF6-4BAB-8D6C-7C6413469A44@vanderbilt.edu> Hi All, I am still working on my issue of the occasional high I/O wait times and that has raised another question ? I know that I can run mmfileid to see what files have a block on a given NSD, but is there a way to do the opposite? I.e. I want to know what NSDs a single file has its? blocks on? The mmlsattr command does not appear to show this information unless it?s got an undocumented option. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jul 9 21:21:29 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 9 Jul 2018 20:21:29 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? Message-ID: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> I don?t think you can do it directly, but you could probably use FileHeat to figure it out indirectly. Look at mmchconfig on how to set these: fileHeatLossPercent 20 fileHeatPeriodMinutes 1440 Then you can run a fairly simple policy scan to dump out the file names and heat value, sort what?s the most active to the top. I?ve done this, and it can prove helpful: define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) END]) rule fh1 external list 'fh' exec '' rule fh2 list 'fh' weight(FILE_HEAT) show( DISPLAY_NULL(FILE_HEAT) || '|' || varchar(file_size) ) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Monday, July 9, 2018 at 3:04 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] What NSDs does a file have blocks on? Hi All, I am still working on my issue of the occasional high I/O wait times and that has raised another question ? I know that I can run mmfileid to see what files have a block on a given NSD, but is there a way to do the opposite? I.e. I want to know what NSDs a single file has its? blocks on? The mmlsattr command does not appear to show this information unless it?s got an undocumented option. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Mon Jul 9 21:51:34 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Mon, 9 Jul 2018 16:51:34 -0400 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: <4DFFEEC3-3AF6-4BAB-8D6C-7C6413469A44@vanderbilt.edu> References: <4DFFEEC3-3AF6-4BAB-8D6C-7C6413469A44@vanderbilt.edu> Message-ID: Hi Kevin, >>I want to know what NSDs a single file has its? blocks on? You may use /usr/lpp/mmfs/samples/fpo/mmgetlocationto obtain the file-to-NSD block layout map. Use the -h option for this tools usage ( mmgetlocation -h). Sample output is below: # File-system block size is 4MiB and sample file is 40MiB. # ls -lh /mnt/gpfs3a/data_out/lf -rw-r--r-- 1 root root 40M Jul 9 16:42 /mnt/gpfs3a/data_out/lf # du -sh /mnt/gpfs3a/data_out/lf 40M /mnt/gpfs3a/data_out/lf # mmlsfs gpfs3a | grep 'Block size' -B 4194304 Block size # The file data is striped across 10 x NSDs (DMD_NSDX) constituting the file-system # /usr/lpp/mmfs/samples/fpo/mmgetlocation -f /mnt/gpfs3a/data_out/lf [FILE /mnt/gpfs3a/data_out/lf INFORMATION] FS_DATA_BLOCKSIZE : 4194304 (bytes) FS_META_DATA_BLOCKSIZE : 4194304 (bytes) FS_FILE_DATAREPLICA : 1 FS_FILE_METADATAREPLICA : 1 FS_FILE_STORAGEPOOLNAME : system FS_FILE_ALLOWWRITEAFFINITY : no FS_FILE_WRITEAFFINITYDEPTH : 0 FS_FILE_BLOCKGROUPFACTOR : 1 chunk(s)# 0 (offset 0) : [DMD_NSD5 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 1 (offset 4194304) : [DMD_NSD6 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 2 (offset 8388608) : [DMD_NSD7 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 3 (offset 12582912) : [DMD_NSD8 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 4 (offset 16777216) : [DMD_NSD9 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 5 (offset 20971520) : [DMD_NSD10 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 6 (offset 25165824) : [DMD_NSD1 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 7 (offset 29360128) : [DMD_NSD2 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 8 (offset 33554432) : [DMD_NSD3 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 9 (offset 37748736) : [DMD_NSD4 c72f1m5u39ib0,c72f1m5u37ib0] [FILE: /mnt/gpfs3a/data_out/lf SUMMARY INFO] replica1: c72f1m5u37ib0,c72f1m5u39ib0: 5 chunk(s) c72f1m5u39ib0,c72f1m5u37ib0: 5 chunk(s) Thanks and Regards, -Kums From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/09/2018 04:05 PM Subject: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I am still working on my issue of the occasional high I/O wait times and that has raised another question ? I know that I can run mmfileid to see what files have a block on a given NSD, but is there a way to do the opposite? I.e. I want to know what NSDs a single file has its? blocks on? The mmlsattr command does not appear to show this information unless it?s got an undocumented option. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Jul 9 22:04:15 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 9 Jul 2018 17:04:15 -0400 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> References: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> Message-ID: (psss... ) tsdbfs Not responsible for anything bad that happens...! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 9 22:03:21 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 9 Jul 2018 21:03:21 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: References: <4DFFEEC3-3AF6-4BAB-8D6C-7C6413469A44@vanderbilt.edu> Message-ID: <7D0DA547-4C19-4AE8-AFF8-BB0FBBF487AA@vanderbilt.edu> Hi Kums, Thanks so much ? this gave me exactly what I was looking for and the output was what I suspected I would see. Unfortunately, that means that the mystery of why we?re having these occasional high I/O wait times persists, but oh well? Kevin On Jul 9, 2018, at 3:51 PM, Kumaran Rajaram > wrote: Hi Kevin, >>I want to know what NSDs a single file has its? blocks on? You may use /usr/lpp/mmfs/samples/fpo/mmgetlocationto obtain the file-to-NSD block layout map. Use the -h option for this tools usage (mmgetlocation -h). Sample output is below: # File-system block size is 4MiB and sample file is 40MiB. # ls -lh /mnt/gpfs3a/data_out/lf -rw-r--r-- 1 root root 40M Jul 9 16:42 /mnt/gpfs3a/data_out/lf # du -sh /mnt/gpfs3a/data_out/lf 40M /mnt/gpfs3a/data_out/lf # mmlsfs gpfs3a | grep 'Block size' -B 4194304 Block size # The file data is striped across 10 x NSDs (DMD_NSDX) constituting the file-system # /usr/lpp/mmfs/samples/fpo/mmgetlocation -f /mnt/gpfs3a/data_out/lf [FILE /mnt/gpfs3a/data_out/lf INFORMATION] FS_DATA_BLOCKSIZE : 4194304 (bytes) FS_META_DATA_BLOCKSIZE : 4194304 (bytes) FS_FILE_DATAREPLICA : 1 FS_FILE_METADATAREPLICA : 1 FS_FILE_STORAGEPOOLNAME : system FS_FILE_ALLOWWRITEAFFINITY : no FS_FILE_WRITEAFFINITYDEPTH : 0 FS_FILE_BLOCKGROUPFACTOR : 1 chunk(s)# 0 (offset 0) : [DMD_NSD5 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 1 (offset 4194304) : [DMD_NSD6 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 2 (offset 8388608) : [DMD_NSD7 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 3 (offset 12582912) : [DMD_NSD8 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 4 (offset 16777216) : [DMD_NSD9 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 5 (offset 20971520) : [DMD_NSD10 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 6 (offset 25165824) : [DMD_NSD1 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 7 (offset 29360128) : [DMD_NSD2 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 8 (offset 33554432) : [DMD_NSD3 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 9 (offset 37748736) : [DMD_NSD4 c72f1m5u39ib0,c72f1m5u37ib0] [FILE: /mnt/gpfs3a/data_out/lf SUMMARY INFO] replica1: c72f1m5u37ib0,c72f1m5u39ib0: 5 chunk(s) c72f1m5u39ib0,c72f1m5u37ib0: 5 chunk(s) Thanks and Regards, -Kums From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 07/09/2018 04:05 PM Subject: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I am still working on my issue of the occasional high I/O wait times and that has raised another question ? I know that I can run mmfileid to see what files have a block on a given NSD, but is there a way to do the opposite? I.e. I want to know what NSDs a single file has its? blocks on? The mmlsattr command does not appear to show this information unless it?s got an undocumented option. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C523052f2a40c48efb5a808d5e5ddc6b0%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636667663044944884&sdata=Q2Wg8yDwA9yu%2FZgJXELr7V3qHAY7I7eKPTBHkqVKA5I%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jul 9 22:21:41 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 9 Jul 2018 21:21:41 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: References: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> Message-ID: <27572009-ACD1-4317-A335-301D42E99BDE@bham.ac.uk> I was going to say something like that ? e.g. blockaddr 563148261 Inode 563148261 snap 0 offset 0 N=2 1:45255923200 13:59403784320 1: and 13: in the output are the NSD disk devices for inode 563148261 Simon From: on behalf of "makaplan at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Monday, 9 July 2018 at 22:04 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? (psss... ) tsdbfs Not responsible for anything bad that happens...! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 9 22:44:07 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 9 Jul 2018 21:44:07 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <20180708174441.EE5BB17B422@gpfsug.org> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> <20180708174441.EE5BB17B422@gpfsug.org> Message-ID: Hi All, Time for a daily update on this saga? First off, responses to those who have responded to me: Yaron - we have QLogic switches, but I?ll RTFM and figure out how to clear the counters ? with a quick look via the CLI interface to one of them I don?t see how to even look at those counters, must less clear them, but I?ll do some digging. QLogic does have a GUI app, but given that the Mac version is PowerPC only I think that?s a dead end! :-O Jonathan - understood. We were just wanting to eliminate as much hardware as potential culprits as we could. The storage arrays will all get a power-cycle this Sunday when we take a downtime to do firmware upgrades on them ? the vendor is basically refusing to assist further until we get on the latest firmware. So ? we had noticed that things seem to calm down starting Friday evening and continuing throughout the weekend. We have a script that runs every half hour and if there?s any NSD servers where ?mmdiag ?iohist? shows an I/O > 1,000 ms, we get an alert (again, designed to alert us of a CBM failure). We only got three all weekend long (as opposed to last week, when the alerts were coming every half hour round the clock). Then, this morning I repeated the ?dd? test that I had run before and after replacing the FC cables going to ?eon34? and which had showed very typical I/O rates for all the NSDs except for the 4 in eon34, which were quite poor (~1.5 - 10 MB/sec). I ran the new tests this morning from different NSD servers and with a higher ?count? passed to dd to eliminate any potential caching effects. I ran the test twice from two different NSD servers and this morning all NSDs - including those on eon34 - showed normal I/O rates! Argh - so do we have a hardware problem or not?!? I still think we do, but am taking *nothing* for granted at this point! So today we also used another script we?ve written to do some investigation ? basically we took the script which runs ?mmdiag ?iohist? and added some options to it so that for every I/O greater than the threshold it will see which client issued the I/O. It then queries SLURM to see what jobs are running on that client. Interestingly enough, one user showed up waaaayyyyyy more often than anybody else. And many times she was on a node with only one other user who we know doesn?t access the GPFS filesystem and other times she was the only user on the node. We certainly recognize that correlation is not causation (she could be a victim and not the culprit), but she was on so many of the reported clients that we decided to investigate further ? but her jobs seem to have fairly modest I/O requirements. Each one processes 4 input files, which are basically just gzip?d text files of 1.5 - 5 GB in size. This is what, however, prompted my other query to the list about determining which NSDs a given file has its? blocks on. I couldn?t see how files of that size could have all their blocks on only a couple of NSDs in the pool (out of 19 total!) but wanted to verify that. The files that I have looked at are evenly spread out across the NSDs. So given that her files are spread across all 19 NSDs in the pool and the high I/O wait times are almost always only on LUNs in eon34 (and, more specifically, on two of the four LUNs in eon34) I?m pretty well convinced it?s not her jobs causing the problems ? I?m back to thinking a weird hardware issue. But if anyone wants to try to convince me otherwise, I?ll listen? Thanks! Kevin On Jul 8, 2018, at 12:32 PM, Yaron Daniel > wrote: Hi Clean all counters on the FC switches and see which port have errors . For brocade run : slotstatsclear statsclear porterrshow For cisco run: clear countersall There might be bad gbic/cable/Storage gbic, which can affect the performance, if there is something like that - u can see which ports have errors grow over time. Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Jonathan Buzzard > To: gpfsug-discuss at spectrumscale.org Date: 07/07/2018 11:43 AM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ On 07/07/18 01:28, Buterbaugh, Kevin L wrote: [SNIP] > > So, to try to rule out everything but the storage array we replaced the > FC cables going from the SAN switches to the array, plugging the new > cables into different ports on the SAN switches. Then we repeated the > dd tests from a different NSD server, which both eliminated the NSD > server and its? FC cables as a potential cause ? and saw results > virtually identical to the previous test. Therefore, we feel pretty > confident that it is the storage array and have let the vendor know all > of this. I was not thinking of doing anything quite as drastic as replacing stuff, more look into the logs on the switches in the FC network and examine them for packet errors. The above testing didn't eliminate bad optics in the storage array itself for example, though it does appear to be the storage arrays themselves. Sounds like they could do with a power cycle... JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=TM-kJsvzTX9cq_xmR5ITHclBCfO4FDvZ3ZxyugfJCfQ&s=Ass164qVEhb9fC4_VCmzfZeYd_BLOv9cZsfkrzqi8pM&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C7c1ced16f6d44055c63408d5e4fa7d2e%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636666686866066749&sdata=Viltitj3L9aScuuVKCLSp9FKkj7xdzWxsvvPVDSUqHw%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Jul 10 12:59:18 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 10 Jul 2018 11:59:18 +0000 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Message-ID: File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? mmadddisk fs1 -F new.nsd The following disks of proserv will be formatted on node srv-gpfs06: stor1v5tb85: size 5242880 MB Extending Allocation Map Disk stor1v5tb85 cannot be added to storage pool Plevel1. Allocation map cannot accommodate disks larger than 4194555 MB. Checking Allocation Map for storage pool Plevel1 mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Tue Jul 10 13:29:59 2018 From: spectrumscale at kiranghag.com (KG) Date: Tue, 10 Jul 2018 17:59:59 +0530 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Message-ID: Addendum to the question... How is this calculated? I figured out it is based on NSD sizes that are initially used but not exactly how. ?KG? On Tue, Jul 10, 2018 at 5:29 PM, Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > File system was originally created with 1TB NSDs (4) and I want to move it > to one 5TB NSD. Any way around this error? > > > > mmadddisk fs1 -F new.nsd > > > > The following disks of proserv will be formatted on node srv-gpfs06: > > stor1v5tb85: size 5242880 MB > > Extending Allocation Map > > Disk stor1v5tb85 cannot be added to storage pool Plevel1. > > *Allocation map cannot accommodate disks larger than 4194555 MB.* > > Checking Allocation Map for storage pool Plevel1 > > mmadddisk: tsadddisk failed. > > Verifying file system configuration information ... > > mmadddisk: Propagating the cluster configuration data to all > > affected nodes. This is an asynchronous process. > > mmadddisk: Command failed. Examine previous error messages to determine > cause. > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Tue Jul 10 13:42:55 2018 From: david_johnson at brown.edu (David D Johnson) Date: Tue, 10 Jul 2018 08:42:55 -0400 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: References: Message-ID: Whenever we start with adding disks of new sizes/speeds/controllers/machine rooms compared to existing NSD's in the filesystem, we generally add them to a new storage pool. Add policy rules to make use of the new pools as desired, migrate stale files to slow disk, active files to faster/newer disk, etc. > On Jul 10, 2018, at 8:29 AM, KG wrote: > > Addendum to the question... > > How is this calculated? I figured out it is based on NSD sizes that are initially used but not exactly how. > > > ?KG? > > On Tue, Jul 10, 2018 at 5:29 PM, Oesterlin, Robert > wrote: > File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? > > > > mmadddisk fs1 -F new.nsd > > > > The following disks of proserv will be formatted on node srv-gpfs06: > > stor1v5tb85: size 5242880 MB > > Extending Allocation Map > > Disk stor1v5tb85 cannot be added to storage pool Plevel1. > > Allocation map cannot accommodate disks larger than 4194555 MB. > > Checking Allocation Map for storage pool Plevel1 > > mmadddisk: tsadddisk failed. > > Verifying file system configuration information ... > > mmadddisk: Propagating the cluster configuration data to all > > affected nodes. This is an asynchronous process. > > mmadddisk: Command failed. Examine previous error messages to determine cause. > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Jul 10 14:00:48 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 10 Jul 2018 14:00:48 +0100 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: , , Message-ID: <1531227648.26036.139.camel@strath.ac.uk> On Mon, 2018-07-09 at 14:57 -0400, Frederick Stock wrote: > Another option is to request Apple to support the OFFLINE flag in the > SMB protocol. ?The more Mac customers making such a request (I have > asked others to do likewise) might convince Apple to add this > checking to their SMB client. > And we have a winner. The only workable solution is to get Apple to Finder to support the OFFLINE flag. However good luck getting Apple to actually do anything. An alternative approach might be to somehow detect the client connecting is running MacOS and prohibit recalls for them. However I am not sure the Samba team would be keen on accepting such patches unless it could be done in say VFS module. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From makaplan at us.ibm.com Tue Jul 10 14:08:45 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 10 Jul 2018 09:08:45 -0400 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: <27572009-ACD1-4317-A335-301D42E99BDE@bham.ac.uk> References: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> <27572009-ACD1-4317-A335-301D42E99BDE@bham.ac.uk> Message-ID: As long as we're giving hints... Seems tsdbfs has several subcommands that might be helpful. I like "inode" But there's also "listda" Subcommand "desc" will show you the structure of the file system under "disks:" you will see which disk numbers are which NSDs. Have fun, but DO NOT use the any of the *patch* subcommands! From: Simon Thompson To: gpfsug main discussion list Date: 07/09/2018 05:21 PM Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org I was going to say something like that ? e.g. blockaddr 563148261 Inode 563148261 snap 0 offset 0 N=2 1:45255923200 13:59403784320 1: and 13: in the output are the NSD disk devices for inode 563148261 Simon From: on behalf of "makaplan at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Monday, 9 July 2018 at 22:04 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? (psss... ) tsdbfs Not responsible for anything bad that happens...! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Jul 10 14:12:02 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 10 Jul 2018 14:12:02 +0100 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> <20180708174441.EE5BB17B422@gpfsug.org> Message-ID: <1531228322.26036.143.camel@strath.ac.uk> On Mon, 2018-07-09 at 21:44 +0000, Buterbaugh, Kevin L wrote: [SNIP] > Interestingly enough, one user showed up waaaayyyyyy more often than > anybody else. ?And many times she was on a node with only one other > user who we know doesn?t access the GPFS filesystem and other times > she was the only user on the node. ? > I have seen on our old HPC system which had been running fine for three years a particular user with a particular piece of software with presumably a particular access pattern trigger a firmware bug in a SAS drive (local disk to the node) that caused it to go offline (dead to the world and power/presence LED off) and only a power cycle of the node would bring it back. At first we through the drives where failing, because what the hell, but in the end a firmware update to the drives and they where fine. The moral of the story is don't rule out wacky access patterns from a single user causing problems. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From UWEFALKE at de.ibm.com Tue Jul 10 15:28:57 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 10 Jul 2018 16:28:57 +0200 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: References: Message-ID: Hi Bob, you sure the first added NSD was 1 TB? As often as i created a FS, the max NSD size was way larger than the one I added initially , not just the fourfold. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 10/07/2018 13:59 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Sent by: gpfsug-discuss-bounces at spectrumscale.org File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? mmadddisk fs1 -F new.nsd The following disks of proserv will be formatted on node srv-gpfs06: stor1v5tb85: size 5242880 MB Extending Allocation Map Disk stor1v5tb85 cannot be added to storage pool Plevel1. Allocation map cannot accommodate disks larger than 4194555 MB. Checking Allocation Map for storage pool Plevel1 mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From p.childs at qmul.ac.uk Tue Jul 10 15:50:54 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 10 Jul 2018 14:50:54 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes Message-ID: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London From bpappas at dstonline.com Tue Jul 10 16:08:03 2018 From: bpappas at dstonline.com (Bill Pappas) Date: Tue, 10 Jul 2018 15:08:03 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms (Bill Pappas) In-Reply-To: References: Message-ID: Years back I did run a trial (to buy) software solution on OSX to address this issue. It worked! It was not cheap and they probably no longer support it anyway. It might have been from a company called Group Logic. I would suggest not exposing HSM enabled file systems (in particular ones using tape on the back end) to your general CIFS (or even) GPFS/NFS clients. It produced years (2011-2015 of frustration with recall storms that made everyone mad. If someone else had success, I think we'd all like to know how they did it....but we gave up on that. In the end I would suggest setting up an explicit archive location using/HSM tape (or low cost, high densisty disk) that is not pointing to your traditional GPFS/CIFS/NFS clients that users must deliberately access (think portal) to check in/out cold data that they can stage to their primary workspace. It is possible you considered this idea or some variation of it anyway and rejected it for good reason (e.g. more pain for the users to stage data over from cold storage to primary workspacec). Bill Pappas 901-619-0585 bpappas at dstonline.com [1466780990050_DSTlogo.png] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Tuesday, July 10, 2018 9:50 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 78, Issue 32 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: preventing HSM tape recall storms (Jonathan Buzzard) 2. Re: What NSDs does a file have blocks on? (Marc A Kaplan) 3. Re: High I/O wait times (Jonathan Buzzard) 4. Re: Allocation map limits - any way around this? (Uwe Falke) 5. Same file opened by many nodes / processes (Peter Childs) ---------------------------------------------------------------------- Message: 1 Date: Tue, 10 Jul 2018 14:00:48 +0100 From: Jonathan Buzzard To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms Message-ID: <1531227648.26036.139.camel at strath.ac.uk> Content-Type: text/plain; charset="UTF-8" On Mon, 2018-07-09 at 14:57 -0400, Frederick Stock wrote: > Another option is to request Apple to support the OFFLINE flag in the > SMB protocol. ?The more Mac customers making such a request (I have > asked others to do likewise) might convince Apple to add this > checking to their SMB client. > And we have a winner. The only workable solution is to get Apple to Finder to support the OFFLINE flag. However good luck getting Apple to actually do anything. An alternative approach might be to somehow detect the client connecting is running MacOS and prohibit recalls for them. However I am not sure the Samba team would be keen on accepting such patches unless it could be done in say VFS module. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ Message: 2 Date: Tue, 10 Jul 2018 09:08:45 -0400 From: "Marc A Kaplan" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Message-ID: Content-Type: text/plain; charset="utf-8" As long as we're giving hints... Seems tsdbfs has several subcommands that might be helpful. I like "inode" But there's also "listda" Subcommand "desc" will show you the structure of the file system under "disks:" you will see which disk numbers are which NSDs. Have fun, but DO NOT use the any of the *patch* subcommands! From: Simon Thompson To: gpfsug main discussion list Date: 07/09/2018 05:21 PM Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org I was going to say something like that ? e.g. blockaddr 563148261 Inode 563148261 snap 0 offset 0 N=2 1:45255923200 13:59403784320 1: and 13: in the output are the NSD disk devices for inode 563148261 Simon From: on behalf of "makaplan at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Monday, 9 July 2018 at 22:04 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? (psss... ) tsdbfs Not responsible for anything bad that happens...! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Tue, 10 Jul 2018 14:12:02 +0100 From: Jonathan Buzzard To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] High I/O wait times Message-ID: <1531228322.26036.143.camel at strath.ac.uk> Content-Type: text/plain; charset="UTF-8" On Mon, 2018-07-09 at 21:44 +0000, Buterbaugh, Kevin L wrote: [SNIP] > Interestingly enough, one user showed up waaaayyyyyy more often than > anybody else. ?And many times she was on a node with only one other > user who we know doesn?t access the GPFS filesystem and other times > she was the only user on the node. ? > I have seen on our old HPC system which had been running fine for three years a particular user with a particular piece of software with presumably a particular access pattern trigger a firmware bug in a SAS drive (local disk to the node) that caused it to go offline (dead to the world and power/presence LED off) and only a power cycle of the node would bring it back. At first we through the drives where failing, because what the hell, but in the end a firmware update to the drives and they where fine. The moral of the story is don't rule out wacky access patterns from a single user causing problems. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ Message: 4 Date: Tue, 10 Jul 2018 16:28:57 +0200 From: "Uwe Falke" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Allocation map limits - any way around this? Message-ID: Content-Type: text/plain; charset="ISO-8859-1" Hi Bob, you sure the first added NSD was 1 TB? As often as i created a FS, the max NSD size was way larger than the one I added initially , not just the fourfold. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 10/07/2018 13:59 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Sent by: gpfsug-discuss-bounces at spectrumscale.org File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? mmadddisk fs1 -F new.nsd The following disks of proserv will be formatted on node srv-gpfs06: stor1v5tb85: size 5242880 MB Extending Allocation Map Disk stor1v5tb85 cannot be added to storage pool Plevel1. Allocation map cannot accommodate disks larger than 4194555 MB. Checking Allocation Map for storage pool Plevel1 mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 5 Date: Tue, 10 Jul 2018 14:50:54 +0000 From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Same file opened by many nodes / processes Message-ID: <4e038c492713f418242be208532e112f8ea50a9f.camel at qmul.ac.uk> Content-Type: text/plain; charset="utf-8" We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 78, Issue 32 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-1466780990.png Type: image/png Size: 6282 bytes Desc: Outlook-1466780990.png URL: From salut4tions at gmail.com Tue Jul 10 16:54:36 2018 From: salut4tions at gmail.com (Jordan Robertson) Date: Tue, 10 Jul 2018 11:54:36 -0400 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: References: Message-ID: To second David's comments: I don't believe changing the max NSD size for a given storage pool is possible (it may be tied to the per-pool allocation mapping?), so if you want to add more dataOnly NSD's to a filesystem and get that error you may need to create a new pool. The tricky bit is that I think this only works with dataOnly NSD's, as dataAndMetadata and metadataOnly NSD's only get added to the system pool which is locked in like any other. -Jordan On Tue, Jul 10, 2018 at 10:28 AM, Uwe Falke wrote: > Hi Bob, > you sure the first added NSD was 1 TB? As often as i created a FS, the max > NSD size was way larger than the one I added initially , not just the > fourfold. > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 10/07/2018 13:59 > Subject: [gpfsug-discuss] Allocation map limits - any way around > this? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > File system was originally created with 1TB NSDs (4) and I want to move it > to one 5TB NSD. Any way around this error? > > mmadddisk fs1 -F new.nsd > > The following disks of proserv will be formatted on node srv-gpfs06: > stor1v5tb85: size 5242880 MB > Extending Allocation Map > Disk stor1v5tb85 cannot be added to storage pool Plevel1. > Allocation map cannot accommodate disks larger than 4194555 MB. > Checking Allocation Map for storage pool Plevel1 > mmadddisk: tsadddisk failed. > Verifying file system configuration information ... > mmadddisk: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > mmadddisk: Command failed. Examine previous error messages to determine > cause. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Tue Jul 10 16:59:57 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 10 Jul 2018 17:59:57 +0200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: Hi, Peter, in theory, the first node opening a file should remain metanode until it closes the file, regardless how many other nodes open it in between (if all the nodes are within the same cluster). MFT is controlling the caching inodes and - AFAIK - also of indirect blocks. A 200 GiB file will most likely have indirect blocks, but just a few up to some tens, depending on the block size in the file system. The default MFT number is much larger. However, if you say the metanode is changing, that might cause some delays, as all token information has to be passed on to the next metanode (not sure how efficient that election is running). Having said that it could help if you use a dedicated node having the file open from start and all the time - this should prevent new metanodes being elected. If you do not get told a solution, you might want to run a trace of the mmbackup scan (maybe once with jobs accessing the file, once without). Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 10/07/2018 16:51 Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Tue Jul 10 17:15:14 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 10 Jul 2018 12:15:14 -0400 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: I would start by making sure that the application(s)... open the file O_RDONLY and then you may want to fiddle with the GPFS atime settings: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_atime.htm At first I thought "uge" was a typo, but I guess you are referring to: https://supcom.hgc.jp/english/utili_info/manual/uge.html Still not begin familiar, it would be "interesting" to know from a file operations point of view, what's going on in terms of opens, reads, closes : per second. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 10 17:17:58 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 10 Jul 2018 12:17:58 -0400 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: References: Message-ID: The only additional piece of information I would add is that you can see what the maximum NSD size is defined for a pool by looking at the output of mmdf. Fred Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jordan Robertson To: gpfsug main discussion list Date: 07/10/2018 11:54 AM Subject: Re: [gpfsug-discuss] Allocation map limits - any way around this? Sent by: gpfsug-discuss-bounces at spectrumscale.org To second David's comments: I don't believe changing the max NSD size for a given storage pool is possible (it may be tied to the per-pool allocation mapping?), so if you want to add more dataOnly NSD's to a filesystem and get that error you may need to create a new pool. The tricky bit is that I think this only works with dataOnly NSD's, as dataAndMetadata and metadataOnly NSD's only get added to the system pool which is locked in like any other. -Jordan On Tue, Jul 10, 2018 at 10:28 AM, Uwe Falke wrote: Hi Bob, you sure the first added NSD was 1 TB? As often as i created a FS, the max NSD size was way larger than the one I added initially , not just the fourfold. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 10/07/2018 13:59 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Sent by: gpfsug-discuss-bounces at spectrumscale.org File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? mmadddisk fs1 -F new.nsd The following disks of proserv will be formatted on node srv-gpfs06: stor1v5tb85: size 5242880 MB Extending Allocation Map Disk stor1v5tb85 cannot be added to storage pool Plevel1. Allocation map cannot accommodate disks larger than 4194555 MB. Checking Allocation Map for storage pool Plevel1 mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Jul 10 17:29:42 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 10 Jul 2018 16:29:42 +0000 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Message-ID: <787C309B-BE1E-47E4-B604-5E43262AFB26@nuance.com> Right - but it doesn?t give me the answer on how to best get around it. :-) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of IBM Spectrum Scale Reply-To: gpfsug main discussion list Date: Tuesday, July 10, 2018 at 11:18 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Allocation map limits - any way around this? The only additional piece of information I would add is that you can see what the maximum NSD size is defined for a pool by looking at the output of mmdf. Fred -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Tue Jul 10 17:59:17 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Tue, 10 Jul 2018 12:59:17 -0400 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: <787C309B-BE1E-47E4-B604-5E43262AFB26@nuance.com> References: <787C309B-BE1E-47E4-B604-5E43262AFB26@nuance.com> Message-ID: <72EAA3FB-5BAE-4C42-BC94-D9E98B4C11E7@brown.edu> I would as I suggested add the new NSD into a new pool in the same filesystem. Then I would migrate all the files off the old pool onto the new one. At this point you can deldisk the old ones or decide what else you?d want to do with them. -- ddj Dave Johnson > On Jul 10, 2018, at 12:29 PM, Oesterlin, Robert wrote: > > Right - but it doesn?t give me the answer on how to best get around it. :-) > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > From: on behalf of IBM Spectrum Scale > Reply-To: gpfsug main discussion list > Date: Tuesday, July 10, 2018 at 11:18 AM > To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Allocation map limits - any way around this? > > The only additional piece of information I would add is that you can see what the maximum NSD size is defined for a pool by looking at the output of mmdf. > > Fred > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scrusan at ddn.com Tue Jul 10 18:09:48 2018 From: scrusan at ddn.com (Steve Crusan) Date: Tue, 10 Jul 2018 17:09:48 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: <4E48904C-5B98-485B-B577-85532C7593A8@ddn.com> I?ve used ?preferDesignatedMnode=1? in the past, but that was for a specific usecase, and that would have to come from the direction of support. I guess if you wanted to test your metanode theory, you could open that file (and keep it open) on node from a different remote cluster, or one of your local NSD servers and see what kind of results you get out of it. ---- Steve Crusan scrusan at ddn.com (719) 695-3190 From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Tuesday, July 10, 2018 at 11:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes I would start by making sure that the application(s)... open the file O_RDONLY and then you may want to fiddle with the GPFS atime settings: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_atime.htm At first I thought "uge" was a typo, but I guess you are referring to: https://supcom.hgc.jp/english/utili_info/manual/uge.html Still not begin familiar, it would be "interesting" to know from a file operations point of view, what's going on in terms of opens, reads, closes : per second. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 10 18:19:47 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 10 Jul 2018 13:19:47 -0400 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From renata at slac.stanford.edu Tue Jul 10 19:35:28 2018 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Tue, 10 Jul 2018 11:35:28 -0700 (PDT) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Hi, many thanks to all of the suggestions for how to deal with this issue. Ftr, I tried this mmchnode --noquorum -N --force on the node that was reinstalled which reinstated some of the communications between the cluster nodes, but then when I restarted the cluster, communications begain to fail again, complaining about not enough CCR nodes for quorum. I ended up reinstalling the cluster since at this point the nodes couldn't mount the remote data and I thought it would be faster. Thanks again for all of the responses, Renata Dart SLAC National Accelerator Lab On Wed, 27 Jun 2018, IBM Spectrum Scale wrote: > >Hi Renata, > >You may want to reduce the set of quorum nodes. If your version supports >the --force option, you can run > >mmchnode --noquorum -N --force > >It is a good idea to configure tiebreaker disks in a cluster that has only >2 quorum nodes. > >Regards, The Spectrum Scale (GPFS) team > >------------------------------------------------------------------------------------------------------------------ > >If you feel that your question can benefit other users of Spectrum Scale >(GPFS), then please post it to the public IBM developerWroks Forum at >https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > >If your query concerns a potential software error in Spectrum Scale (GPFS) >and you have an IBM software maintenance contract please contact >1-800-237-5511 in the United States or your local IBM Service Center in >other countries. > >The forum is informally monitored as time permits and should not be used >for priority messages to the Spectrum Scale (GPFS) team. > > > >From: Renata Maria Dart >To: gpfsug-discuss at spectrumscale.org >Date: 06/27/2018 02:21 PM >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues >Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving >data, > >root at ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine >cause. > > >On the reinstalled node, this fails in the same way: > >[root at ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine >cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > From bbanister at jumptrading.com Tue Jul 10 21:50:23 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 10 Jul 2018 20:50:23 +0000 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: <72EAA3FB-5BAE-4C42-BC94-D9E98B4C11E7@brown.edu> References: <787C309B-BE1E-47E4-B604-5E43262AFB26@nuance.com> <72EAA3FB-5BAE-4C42-BC94-D9E98B4C11E7@brown.edu> Message-ID: +1 From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of david_johnson at brown.edu Sent: Tuesday, July 10, 2018 11:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Allocation map limits - any way around this? Note: External Email ________________________________ I would as I suggested add the new NSD into a new pool in the same filesystem. Then I would migrate all the files off the old pool onto the new one. At this point you can deldisk the old ones or decide what else you?d want to do with them. -- ddj Dave Johnson On Jul 10, 2018, at 12:29 PM, Oesterlin, Robert > wrote: Right - but it doesn?t give me the answer on how to best get around it. :-) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of IBM Spectrum Scale > Reply-To: gpfsug main discussion list > Date: Tuesday, July 10, 2018 at 11:18 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Allocation map limits - any way around this? The only additional piece of information I would add is that you can see what the maximum NSD size is defined for a pool by looking at the output of mmdf. Fred _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Tue Jul 10 22:06:27 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 10 Jul 2018 21:06:27 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, Message-ID: The reason I think the metanode is moving around is I'd done a limited amount of trying to track it down using "mmfsadm saferdump file" and it moved before I'd tracked down the correct metanode. But I might have been chasing ghosts, so it may be operating normally and nothing to worry about. The user reading the file only has read access to it from the file permissions, Mmbackup has only slowed down while this job has been running. As I say the scan for what to backup usally takes 40-60 minutes, but is currently taking 3-4 hours with these jobs running. I've seen it take 3 days when our storage went bad (slow and failing disks) but that is usally a sign of a bad disk and pulling the disk and rebuilding the RAID "fixed" that straight away. I cant see anything like that currently however. It might be that its network congestion were suffering from and nothing to do with token management but as the mmpmon bytes read data is running very high with this job and the load is spread over 50+ nodes it's difficult to see one culprit. It's a mixed speed ethernet network mainly 10GB connected although the nodes in question are legacy with only 1GB connections (and 40GB to the back of the storage. We're currently running 4.2.3-8 Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- IBM Spectrum Scale wrote ---- What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Tue Jul 10 22:12:16 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 10 Jul 2018 21:12:16 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: References: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> <27572009-ACD1-4317-A335-301D42E99BDE@bham.ac.uk> Message-ID: <5565130575454bf7a80802ecd55faec3@jumptrading.com> I know we are trying to be helpful, but suggesting that admins mess with undocumented, dangerous commands isn?t a good idea. If directed from an IBM support person with explicit instructions, then good enough, IFF it?s really required and worth the risk! I think the Kum?s suggestions are definitely a right way to handle this. In general, avoid running ts* commands unless directed by somebody that knows exactly what they are doing and understands your issue in great detail!! Just a word to the wise.. 2 cents? etc, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Tuesday, July 10, 2018 8:09 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Note: External Email ________________________________ As long as we're giving hints... Seems tsdbfs has several subcommands that might be helpful. I like "inode" But there's also "listda" Subcommand "desc" will show you the structure of the file system under "disks:" you will see which disk numbers are which NSDs. Have fun, but DO NOT use the any of the *patch* subcommands! From: Simon Thompson > To: gpfsug main discussion list > Date: 07/09/2018 05:21 PM Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I was going to say something like that ? e.g. blockaddr 563148261 Inode 563148261 snap 0 offset 0 N=2 1:45255923200 13:59403784320 1: and 13: in the output are the NSD disk devices for inode 563148261 Simon From: > on behalf of "makaplan at us.ibm.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 9 July 2018 at 22:04 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? (psss... ) tsdbfs Not responsible for anything bad that happens...! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Tue Jul 10 22:23:34 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 10 Jul 2018 21:23:34 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, , Message-ID: Oh the cluster is 296 nodes currently with a set size of 300 (mmcrfs -n 300) We're currently looking to upgrade the 1G connected nodes to 10G within the next few months. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Peter Childs wrote ---- The reason I think the metanode is moving around is I'd done a limited amount of trying to track it down using "mmfsadm saferdump file" and it moved before I'd tracked down the correct metanode. But I might have been chasing ghosts, so it may be operating normally and nothing to worry about. The user reading the file only has read access to it from the file permissions, Mmbackup has only slowed down while this job has been running. As I say the scan for what to backup usally takes 40-60 minutes, but is currently taking 3-4 hours with these jobs running. I've seen it take 3 days when our storage went bad (slow and failing disks) but that is usally a sign of a bad disk and pulling the disk and rebuilding the RAID "fixed" that straight away. I cant see anything like that currently however. It might be that its network congestion were suffering from and nothing to do with token management but as the mmpmon bytes read data is running very high with this job and the load is spread over 50+ nodes it's difficult to see one culprit. It's a mixed speed ethernet network mainly 10GB connected although the nodes in question are legacy with only 1GB connections (and 40GB to the back of the storage. We're currently running 4.2.3-8 Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- IBM Spectrum Scale wrote ---- What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 10 23:15:01 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 10 Jul 2018 18:15:01 -0400 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, , Message-ID: Regarding the permissions on the file I assume you are not using ACLs, correct? If you are then you would need to check what the ACL allows. Is your metadata on separate NSDs? Having metadata on separate NSDs, and preferably fast NSDs, would certainly help your mmbackup scanning. Have you looked at the information from netstat or similar network tools to see how your network is performing? Faster networks generally require a bit of OS tuning and some GPFS tuning to optimize their performance. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: gpfsug main discussion list Date: 07/10/2018 05:23 PM Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org Oh the cluster is 296 nodes currently with a set size of 300 (mmcrfs -n 300) We're currently looking to upgrade the 1G connected nodes to 10G within the next few months. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Peter Childs wrote ---- The reason I think the metanode is moving around is I'd done a limited amount of trying to track it down using "mmfsadm saferdump file" and it moved before I'd tracked down the correct metanode. But I might have been chasing ghosts, so it may be operating normally and nothing to worry about. The user reading the file only has read access to it from the file permissions, Mmbackup has only slowed down while this job has been running. As I say the scan for what to backup usally takes 40-60 minutes, but is currently taking 3-4 hours with these jobs running. I've seen it take 3 days when our storage went bad (slow and failing disks) but that is usally a sign of a bad disk and pulling the disk and rebuilding the RAID "fixed" that straight away. I cant see anything like that currently however. It might be that its network congestion were suffering from and nothing to do with token management but as the mmpmon bytes read data is running very high with this job and the load is spread over 50+ nodes it's difficult to see one culprit. It's a mixed speed ethernet network mainly 10GB connected although the nodes in question are legacy with only 1GB connections (and 40GB to the back of the storage. We're currently running 4.2.3-8 Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- IBM Spectrum Scale wrote ---- What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Wed Jul 11 13:30:16 2018 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Wed, 11 Jul 2018 14:30:16 +0200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From heiner.billich at psi.ch Wed Jul 11 14:40:46 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Wed, 11 Jul 2018 13:40:46 +0000 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown Message-ID: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> Hello, I have two nodes which hang on ?mmshutdown?, in detail the command ?/sbin/rmmod mmfs26? hangs. I get kernel messages which I append below. I wonder if this looks familiar to somebody? Is it a known bug? I can avoid the issue if I reduce pagepool from 128G to 64G. Running ?systemctl stop gpfs? shows the same issue. It forcefully terminates after a while, but ?rmmod? stays stuck. Two functions cxiReleaseAndForgetPages and put_page seem to be involved, the first part of gpfs, the second a kernel call. The servers have 256G memory and 72 (virtual) cores each. I run 5.0.1-1 on RHEL7.4 with kernel 3.10.0-693.17.1.el7.x86_64. I can try to switch back to 5.0.0 Thank you & kind regards, Heiner Jul 11 14:12:04 node-1.x.y mmremote[1641]: Unloading module mmfs26 Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The Spectrum Scale service process not running on this node. Normal operation cannot be done Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [I] Event raised: The Spectrum Scale service process is running Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The node is not able to form a quorum with the other available nodes. Jul 11 14:12:38 node-1.x.y sshd[2826]: Connection closed by xxx port 52814 [preauth] Jul 11 14:12:41 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [rmmod:2695] Jul 11 14:12:41 node-1.x.y kernel: Modules linked in: mmfs26(OE-) mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect Jul 11 14:12:41 node-1.x.y kernel: sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] Jul 11 14:12:41 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 Jul 11 14:12:41 node-1.x.y kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Jul 11 14:12:41 node-1.x.y kernel: task: ffff8808c4814f10 ti: ffff881619778000 task.ti: ffff881619778000 Jul 11 14:12:41 node-1.x.y kernel: RIP: 0010:[] [] put_compound_page+0xc3/0x174 Jul 11 14:12:41 node-1.x.y kernel: RSP: 0018:ffff88161977bd50 EFLAGS: 00000246 Jul 11 14:12:41 node-1.x.y kernel: RAX: 0000000000000283 RBX: 00000000fae3d201 RCX: 0000000000000284 Jul 11 14:12:41 node-1.x.y kernel: RDX: 0000000000000283 RSI: 0000000000000246 RDI: ffffea003d478000 Jul 11 14:12:41 node-1.x.y kernel: RBP: ffff88161977bd68 R08: ffff881ffae3d1e0 R09: 0000000180800059 Jul 11 14:12:41 node-1.x.y kernel: R10: 00000000fae3d201 R11: ffffea007feb8f40 R12: 00000000fae3d201 Jul 11 14:12:41 node-1.x.y kernel: R13: ffff88161977bd40 R14: 0000000000000000 R15: ffff88161977bd40 Jul 11 14:12:41 node-1.x.y kernel: FS: 00007f81a1db0740(0000) GS:ffff883ffee80000(0000) knlGS:0000000000000000 Jul 11 14:12:41 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 11 14:12:41 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: 0000000c36b2c000 CR4: 00000000001607e0 Jul 11 14:12:41 node-1.x.y kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 11 14:12:41 node-1.x.y kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 11 14:12:41 node-1.x.y kernel: Call Trace: Jul 11 14:12:41 node-1.x.y kernel: [] put_page+0x45/0x50 Jul 11 14:12:41 node-1.x.y kernel: [] cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] ? kmem_cache_free+0x1e2/0x200 Jul 11 14:12:41 node-1.x.y kernel: [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] SyS_delete_module+0x19b/0x300 Jul 11 14:12:41 node-1.x.y kernel: [] system_call_fastpath+0x16/0x1b Jul 11 14:12:41 node-1.x.y kernel: Code: d1 00 00 00 4c 89 e7 e8 3a ff ff ff e9 c4 00 00 00 4c 39 e3 74 c1 41 8b 54 24 1c 85 d2 74 b8 8d 4a 01 89 d0 f0 41 0f b1 4c 24 1c <39> c2 74 04 89 c2 eb e8 e8 f3 f0 ae ff 49 89 c5 f0 41 0f ba 2c Jul 11 14:13:23 node-1.x.y systemd[1]: gpfs.service stopping timed out. Terminating. Jul 11 14:13:27 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 21s! [rmmod:2695] Jul 11 14:13:27 node-1.x.y kernel: Modules linked in: mmfs26(OE-) mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler Jul 11 14:13:27 node-1.x.y kernel: INFO: rcu_sched detected stalls on CPUs/tasks: Jul 11 14:13:27 node-1.x.y kernel: { Jul 11 14:13:27 node-1.x.y kernel: 28 Jul 11 14:13:27 node-1.x.y kernel: } Jul 11 14:13:27 node-1.x.y kernel: (detected by 17, t=60002 jiffies, g=267734, c=267733, q=36089) Jul 11 14:13:27 node-1.x.y kernel: Task dump for CPU 28: Jul 11 14:13:27 node-1.x.y kernel: rmmod R Jul 11 14:13:27 node-1.x.y kernel: running task Jul 11 14:13:27 node-1.x.y kernel: 0 2695 2642 0x00000008 Jul 11 14:13:27 node-1.x.y kernel: Call Trace: Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? __free_slab+0xdc/0x200 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? __put_compound_page+0x22/0x22 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? put_page+0x45/0x50 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? SyS_delete_module+0x19b/0x300 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? system_call_fastpath+0x16/0x1b Jul 11 14:13:27 node-1.x.y kernel: acpi_power_meter Jul 11 14:13:27 node-1.x.y kernel: binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] Jul 11 14:13:27 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 Jul 11 14:13:27 node-1.x.y kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Jul 11 14:13:27 node-1.x.y kernel: task: ffff8808c4814f10 ti: ffff881619778000 task.ti: ffff881619778000 Jul 11 14:13:27 node-1.x.y kernel: RIP: 0010:[] [] __put_compound_page+0x22/0x22 Jul 11 14:13:27 node-1.x.y kernel: RSP: 0018:ffff88161977bd70 EFLAGS: 00000282 Jul 11 14:13:27 node-1.x.y kernel: RAX: 002fffff00008010 RBX: 0000000000000135 RCX: 00000000000001c1 Jul 11 14:13:27 node-1.x.y kernel: RDX: ffff8814adbbf000 RSI: 0000000000000246 RDI: ffffea00650e7040 Jul 11 14:13:27 node-1.x.y kernel: RBP: ffff88161977bd78 R08: ffff881ffae3df60 R09: 0000000180800052 Jul 11 14:13:27 node-1.x.y kernel: R10: 00000000fae3db01 R11: ffffea007feb8f40 R12: ffff881ffae3df60 Jul 11 14:13:27 node-1.x.y kernel: R13: 0000000180800052 R14: 00000000fae3db01 R15: ffffea007feb8f40 Jul 11 14:13:27 node-1.x.y kernel: FS: 00007f81a1db0740(0000) GS:ffff883ffee80000(0000) knlGS:0000000000000000 Jul 11 14:13:27 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 11 14:13:27 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: 0000000c36b2c000 CR4: 00000000001607e0 Jul 11 14:13:27 node-1.x.y kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 11 14:13:27 node-1.x.y kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 11 14:13:27 node-1.x.y kernel: Call Trace: Jul 11 14:13:27 node-1.x.y kernel: [] ? put_page+0x45/0x50 Jul 11 14:13:27 node-1.x.y kernel: [] cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] SyS_delete_module+0x19b/0x300 Jul 11 14:13:27 node-1.x.y kernel: [] system_call_fastpath+0x16/0x1b Jul 11 14:13:27 node-1.x.y kernel: Code: c0 0f 95 c0 0f b6 c0 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 8b 07 48 89 fb a8 20 74 05 e8 0c f8 ae ff 48 89 df ff 53 60 5b 5d c3 <0f> 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 07 48 89 fb f6 -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jul 11 14:47:06 2018 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 11 Jul 2018 06:47:06 -0700 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> Message-ID: Hi, what does numactl -H report ? also check if this is set to yes : root at fab3a:~# mmlsconfig numaMemoryInterleave numaMemoryInterleave yes Sven On Wed, Jul 11, 2018 at 6:40 AM Billich Heinrich Rainer (PSI) < heiner.billich at psi.ch> wrote: > Hello, > > > > I have two nodes which hang on ?mmshutdown?, in detail the command > ?/sbin/rmmod mmfs26? hangs. I get kernel messages which I append below. I > wonder if this looks familiar to somebody? Is it a known bug? I can avoid > the issue if I reduce pagepool from 128G to 64G. > > > > Running ?systemctl stop gpfs? shows the same issue. It forcefully > terminates after a while, but ?rmmod? stays stuck. > > > > Two functions cxiReleaseAndForgetPages and put_page seem to be involved, > the first part of gpfs, the second a kernel call. > > > > The servers have 256G memory and 72 (virtual) cores each. > > I run 5.0.1-1 on RHEL7.4 with kernel 3.10.0-693.17.1.el7.x86_64. > > > > I can try to switch back to 5.0.0 > > > > Thank you & kind regards, > > > > Heiner > > > > > > > > Jul 11 14:12:04 node-1.x.y mmremote[1641]: Unloading module mmfs26 > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The Spectrum > Scale service process not running on this node. Normal operation cannot be > done > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [I] Event raised: The Spectrum > Scale service process is running > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The node is > not able to form a quorum with the other available nodes. > > Jul 11 14:12:38 node-1.x.y sshd[2826]: Connection closed by xxx port 52814 > [preauth] > > > > Jul 11 14:12:41 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 > stuck for 23s! [rmmod:2695] > > > > Jul 11 14:12:41 node-1.x.y kernel: Modules linked in: mmfs26(OE-) > mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) > rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) > mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) > mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp > coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt > pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler > acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc > ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 > i2c_algo_bit drm_kms_helper syscopyarea sysfillrect > > Jul 11 14:12:41 node-1.x.y kernel: sysimgblt fb_sys_fops ttm ixgbe > mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp > crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca > dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] > > Jul 11 14:12:41 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: > G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > Jul 11 14:12:41 node-1.x.y kernel: Hardware name: HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Jul 11 14:12:41 node-1.x.y kernel: task: ffff8808c4814f10 ti: > ffff881619778000 task.ti: ffff881619778000 > > Jul 11 14:12:41 node-1.x.y kernel: RIP: 0010:[] > [] put_compound_page+0xc3/0x174 > > Jul 11 14:12:41 node-1.x.y kernel: RSP: 0018:ffff88161977bd50 EFLAGS: > 00000246 > > Jul 11 14:12:41 node-1.x.y kernel: RAX: 0000000000000283 RBX: > 00000000fae3d201 RCX: 0000000000000284 > > Jul 11 14:12:41 node-1.x.y kernel: RDX: 0000000000000283 RSI: > 0000000000000246 RDI: ffffea003d478000 > > Jul 11 14:12:41 node-1.x.y kernel: RBP: ffff88161977bd68 R08: > ffff881ffae3d1e0 R09: 0000000180800059 > > Jul 11 14:12:41 node-1.x.y kernel: R10: 00000000fae3d201 R11: > ffffea007feb8f40 R12: 00000000fae3d201 > > Jul 11 14:12:41 node-1.x.y kernel: R13: ffff88161977bd40 R14: > 0000000000000000 R15: ffff88161977bd40 > > Jul 11 14:12:41 node-1.x.y kernel: FS: 00007f81a1db0740(0000) > GS:ffff883ffee80000(0000) knlGS:0000000000000000 > > Jul 11 14:12:41 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Jul 11 14:12:41 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: > 0000000c36b2c000 CR4: 00000000001607e0 > > Jul 11 14:12:41 node-1.x.y kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Jul 11 14:12:41 node-1.x.y kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > > > > Jul 11 14:12:41 node-1.x.y kernel: Call Trace: > > Jul 11 14:12:41 node-1.x.y kernel: [] put_page+0x45/0x50 > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] ? > kmem_cache_free+0x1e2/0x200 > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] mmfs+0xc85/0xca0 > [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] > SyS_delete_module+0x19b/0x300 > > Jul 11 14:12:41 node-1.x.y kernel: [] > system_call_fastpath+0x16/0x1b > > Jul 11 14:12:41 node-1.x.y kernel: Code: d1 00 00 00 4c 89 e7 e8 3a ff ff > ff e9 c4 00 00 00 4c 39 e3 74 c1 41 8b 54 24 1c 85 d2 74 b8 8d 4a 01 89 d0 > f0 41 0f b1 4c 24 1c <39> c2 74 04 89 c2 eb e8 e8 f3 f0 ae ff 49 89 c5 f0 > 41 0f ba 2c > > > > Jul 11 14:13:23 node-1.x.y systemd[1]: gpfs.service stopping timed out. > Terminating. > > > > Jul 11 14:13:27 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 > stuck for 21s! [rmmod:2695] > > > > Jul 11 14:13:27 node-1.x.y kernel: Modules linked in: mmfs26(OE-) > mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) > rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) > mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) > mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp > coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt > pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler > > Jul 11 14:13:27 node-1.x.y kernel: INFO: rcu_sched detected stalls on > CPUs/tasks: > > Jul 11 14:13:27 node-1.x.y kernel: { > > Jul 11 14:13:27 node-1.x.y kernel: 28 > > Jul 11 14:13:27 node-1.x.y kernel: } > > Jul 11 14:13:27 node-1.x.y kernel: (detected by 17, t=60002 jiffies, > g=267734, c=267733, q=36089) > > Jul 11 14:13:27 node-1.x.y kernel: Task dump for CPU 28: > > Jul 11 14:13:27 node-1.x.y kernel: rmmod R > > Jul 11 14:13:27 node-1.x.y kernel: running task > > Jul 11 14:13:27 node-1.x.y kernel: 0 2695 2642 0x00000008 > > Jul 11 14:13:27 node-1.x.y kernel: Call Trace: > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > __free_slab+0xdc/0x200 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > __put_compound_page+0x22/0x22 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > put_page+0x45/0x50 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > mmfs+0xc85/0xca0 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > SyS_delete_module+0x19b/0x300 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > system_call_fastpath+0x16/0x1b > > Jul 11 14:13:27 node-1.x.y kernel: acpi_power_meter > > Jul 11 14:13:27 node-1.x.y kernel: binfmt_misc nfsd auth_rpcgss nfs_acl > lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif > crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul > mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa > pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log > dm_mod [last unloaded: tracedev] > > Jul 11 14:13:27 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: > G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > Jul 11 14:13:27 node-1.x.y kernel: Hardware name: HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Jul 11 14:13:27 node-1.x.y kernel: task: ffff8808c4814f10 ti: > ffff881619778000 task.ti: ffff881619778000 > > Jul 11 14:13:27 node-1.x.y kernel: RIP: 0010:[] > [] __put_compound_page+0x22/0x22 > > Jul 11 14:13:27 node-1.x.y kernel: RSP: 0018:ffff88161977bd70 EFLAGS: > 00000282 > > Jul 11 14:13:27 node-1.x.y kernel: RAX: 002fffff00008010 RBX: > 0000000000000135 RCX: 00000000000001c1 > > Jul 11 14:13:27 node-1.x.y kernel: RDX: ffff8814adbbf000 RSI: > 0000000000000246 RDI: ffffea00650e7040 > > Jul 11 14:13:27 node-1.x.y kernel: RBP: ffff88161977bd78 R08: > ffff881ffae3df60 R09: 0000000180800052 > > Jul 11 14:13:27 node-1.x.y kernel: R10: 00000000fae3db01 R11: > ffffea007feb8f40 R12: ffff881ffae3df60 > > Jul 11 14:13:27 node-1.x.y kernel: R13: 0000000180800052 R14: > 00000000fae3db01 R15: ffffea007feb8f40 > > Jul 11 14:13:27 node-1.x.y kernel: FS: 00007f81a1db0740(0000) > GS:ffff883ffee80000(0000) knlGS:0000000000000000 > > Jul 11 14:13:27 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Jul 11 14:13:27 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: > 0000000c36b2c000 CR4: 00000000001607e0 > > Jul 11 14:13:27 node-1.x.y kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Jul 11 14:13:27 node-1.x.y kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > > Jul 11 14:13:27 node-1.x.y kernel: Call Trace: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > put_page+0x45/0x50 > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] mmfs+0xc85/0xca0 > [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] > SyS_delete_module+0x19b/0x300 > > Jul 11 14:13:27 node-1.x.y kernel: [] > system_call_fastpath+0x16/0x1b > > Jul 11 14:13:27 node-1.x.y kernel: Code: c0 0f 95 c0 0f b6 c0 5d c3 0f 1f > 44 00 00 55 48 89 e5 53 48 8b 07 48 89 fb a8 20 74 05 e8 0c f8 ae ff 48 89 > df ff 53 60 5b 5d c3 <0f> 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 07 > 48 89 fb f6 > > > > -- > > Paul Scherrer Institut > > Science IT > > Heiner Billich > > WHGA 106 > > CH 5232 Villigen PSI > > 056 310 36 02 > > https://www.psi.ch > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 11 15:32:37 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 11 Jul 2018 14:32:37 +0000 Subject: [gpfsug-discuss] mmdiag --iohist question Message-ID: <351F676D-D785-4895-A278-3BEA717B9C87@vanderbilt.edu> Hi All, Quick question about ?mmdiag ?iohist? that is not documented in the man page ? what does it mean if the client IP address field is blank? That the NSD server itself issued the I/O? Or ??? This only happens occasionally ? and the way I discovered it was that our Python script that takes ?mmdiag ?iohist? output, looks up the client IP for any waits above the threshold, converts that to a hostname, and queries SLURM for whose jobs are on that client started occasionally throwing an exception ? and when I started looking at the ?mmdiag ?iohist? output itself I do see times when there is no client IP address listed for a I/O wait. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zacekm at img.cas.cz Thu Jul 12 07:46:22 2018 From: zacekm at img.cas.cz (Michal Zacek) Date: Thu, 12 Jul 2018 08:46:22 +0200 Subject: [gpfsug-discuss] File placement rule for new files in directory Message-ID: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> Hello, it is possible to create file placement policy for new files in one directory? I need something like this --> All new files created in directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". Thanks. Best regards, Michal -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3776 bytes Desc: Elektronicky podpis S/MIME URL: From S.J.Thompson at bham.ac.uk Thu Jul 12 09:04:11 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 12 Jul 2018 08:04:11 +0000 Subject: [gpfsug-discuss] File placement rule for new files in directory In-Reply-To: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> Message-ID: <8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> Is ABCD a fileset? If so, its easy with something like: RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name') Simon ?On 12/07/2018, 07:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of zacekm at img.cas.cz" wrote: Hello, it is possible to create file placement policy for new files in one directory? I need something like this --> All new files created in directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". Thanks. Best regards, Michal From Renar.Grunenberg at huk-coburg.de Thu Jul 12 09:17:37 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 12 Jul 2018 08:17:37 +0000 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Message-ID: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> Hallo All, we see after a reboot of two NSD-Servers some disks in different filesystems are down and we don?t see why. The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and SS 5.0.1.1. The question now, there are any log, structures in the gpfs deamon that log these situation? What was the reason why the deamon hast no access to the disks at that startup phase. Any hints are appreciated. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From smita.raut at in.ibm.com Thu Jul 12 09:39:20 2018 From: smita.raut at in.ibm.com (Smita J Raut) Date: Thu, 12 Jul 2018 14:09:20 +0530 Subject: [gpfsug-discuss] File placement rule for new files in directory In-Reply-To: <8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> <8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> Message-ID: If ABCD is not a fileset then below rule can be used- RULE 'ABCD-rule-01' SET POOL 'fastdata' WHERE PATH_NAME LIKE ' /gpfs/gpfs01/ABCD/%' Thanks, Smita From: Simon Thompson To: gpfsug main discussion list Date: 07/12/2018 01:34 PM Subject: Re: [gpfsug-discuss] File placement rule for new files in directory Sent by: gpfsug-discuss-bounces at spectrumscale.org Is ABCD a fileset? If so, its easy with something like: RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name') Simon On 12/07/2018, 07:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of zacekm at img.cas.cz" wrote: Hello, it is possible to create file placement policy for new files in one directory? I need something like this --> All new files created in directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". Thanks. Best regards, Michal _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 12 09:40:06 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 12 Jul 2018 08:40:06 +0000 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Message-ID: <34BB4D15-5F76-453B-AC8C-FF5096133296@bham.ac.uk> How are the disks attached? We have some IB/SRP storage that is sometimes a little slow to appear in multipath and have seen this in the past (we since set autoload=off and always check multipath before restarting GPFS on the node). Simon From: on behalf of "Renar.Grunenberg at huk-coburg.de" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 12 July 2018 at 09:17 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Hallo All, we see after a reboot of two NSD-Servers some disks in different filesystems are down and we don?t see why. The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and SS 5.0.1.1. The question now, there are any log, structures in the gpfs deamon that log these situation? What was the reason why the deamon hast no access to the disks at that startup phase. Any hints are appreciated. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zacekm at img.cas.cz Thu Jul 12 09:49:38 2018 From: zacekm at img.cas.cz (Michal Zacek) Date: Thu, 12 Jul 2018 10:49:38 +0200 Subject: [gpfsug-discuss] File placement rule for new files in directory In-Reply-To: References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> <8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> Message-ID: <3cf065b3-383b-d1ff-1a33-3cc4b5845274@img.cas.cz> That's perfect, thank you both. Best regards Michal Dne 12.7.2018 v 10:39 Smita J Raut napsal(a): > If ABCD is not a fileset then below rule can be used- > > RULE 'ABCD-rule-01' SET POOL 'fastdata' WHERE PATH_NAME LIKE > '/gpfs/gpfs01/ABCD/%' > > Thanks, > Smita > > > > From: Simon Thompson > To: gpfsug main discussion list > Date: 07/12/2018 01:34 PM > Subject: Re: [gpfsug-discuss] File placement rule for new files in > directory > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Is ABCD a fileset? If so, its easy with something like: > > RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name') > > Simon > > On 12/07/2018, 07:56, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of zacekm at img.cas.cz" on behalf of zacekm at img.cas.cz> wrote: > > ? ?Hello, > > ? ?it is possible to create file placement policy for new files in one > ? ?directory? I need something like this --> All new files created in > ? ?directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". > ? ?Thanks. > > ? ?Best regards, > ? ?Michal > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3776 bytes Desc: Elektronicky podpis S/MIME URL: From Achim.Rehor at de.ibm.com Thu Jul 12 10:47:26 2018 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Thu, 12 Jul 2018 11:47:26 +0200 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot In-Reply-To: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> References: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Thu Jul 12 11:01:29 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 12 Jul 2018 10:01:29 +0000 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot In-Reply-To: References: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> Message-ID: <63cd931c1977483089ad2d9546803461@SMXRF105.msg.hukrf.de> Hallo Achim, hallo Simon, first thanks for your answers. I think Achims answers map these at best. The nsd-servers (only 2) for these disk were mistakenly restart in a same time window. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Achim Rehor Gesendet: Donnerstag, 12. Juli 2018 11:47 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Analyse steps if disk are down after reboot Hi Renar, whenever an access to a NSD happens, there is a potential that the node cannot access the disk, so if the (only) NSD server is down, there will be no chance to access the disk, and the disk will be set down. If you have twintailed disks, the 'second' (or possibly some more) NSD server will be asked, switching to networked access, and in that case only if that also fails, the disk will be set to down as well. Not sure how your setup is, but if you reboot 2 NSD servers, and some client possibly did IO to a file served by just these 2, then the 'down' state would be explainable. Rebooting of an NSD server should never set a disk to down, except, he was the only one serving that NSD. Mit freundlichen Gr??en / Kind regards Achim Rehor ________________________________ Software Technical Support Specialist AIX/ Emea HPC Support [cid:image001.gif at 01D419D7.A9373E60] IBM Certified Advanced Technical Expert - Power Systems with AIX TSCC Software Service, Dept. 7922 Global Technology Services ________________________________ Phone: +49-7034-274-7862 IBM Deutschland E-Mail: Achim.Rehor at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martin Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940 From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 12/07/2018 10:17 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hallo All, we see after a reboot of two NSD-Servers some disks in different filesystems are down and we don?t see why. The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and SS 5.0.1.1. The question now, there are any log, structures in the gpfs deamon that log these situation? What was the reason why the deamon hast no access to the disks at that startup phase. Any hints are appreciated. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 7182 bytes Desc: image001.gif URL: From scale at us.ibm.com Thu Jul 12 12:33:39 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 12 Jul 2018 07:33:39 -0400 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot In-Reply-To: <63cd931c1977483089ad2d9546803461@SMXRF105.msg.hukrf.de> References: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> <63cd931c1977483089ad2d9546803461@SMXRF105.msg.hukrf.de> Message-ID: Just to follow up on the question about where to learn why a NSD is marked down you should see a message in the GPFS log, /var/adm/ras/mmfs.log.* Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" To: 'gpfsug main discussion list' Date: 07/12/2018 06:01 AM Subject: Re: [gpfsug-discuss] Analyse steps if disk are down after reboot Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Achim, hallo Simon, first thanks for your answers. I think Achims answers map these at best. The nsd-servers (only 2) for these disk were mistakenly restart in a same time window. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Achim Rehor Gesendet: Donnerstag, 12. Juli 2018 11:47 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Analyse steps if disk are down after reboot Hi Renar, whenever an access to a NSD happens, there is a potential that the node cannot access the disk, so if the (only) NSD server is down, there will be no chance to access the disk, and the disk will be set down. If you have twintailed disks, the 'second' (or possibly some more) NSD server will be asked, switching to networked access, and in that case only if that also fails, the disk will be set to down as well. Not sure how your setup is, but if you reboot 2 NSD servers, and some client possibly did IO to a file served by just these 2, then the 'down' state would be explainable. Rebooting of an NSD server should never set a disk to down, except, he was the only one serving that NSD. Mit freundlichen Gr??en / Kind regards Achim Rehor Software Technical Support Specialist AIX/ Emea HPC Support IBM Certified Advanced Technical Expert - Power Systems with AIX TSCC Software Service, Dept. 7922 Global Technology Services Phone: +49-7034-274-7862 IBM Deutschland E-Mail: Achim.Rehor at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martin Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940 From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" < gpfsug-discuss at spectrumscale.org> Date: 12/07/2018 10:17 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we see after a reboot of two NSD-Servers some disks in different filesystems are down and we don?t see why. The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and SS 5.0.1.1. The question now, there are any log, structures in the gpfs deamon that log these situation? What was the reason why the deamon hast no access to the disks at that startup phase. Any hints are appreciated. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From UWEFALKE at de.ibm.com Thu Jul 12 14:16:23 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 12 Jul 2018 15:16:23 +0200 Subject: [gpfsug-discuss] File placement rule for new files in directory In-Reply-To: <3cf065b3-383b-d1ff-1a33-3cc4b5845274@img.cas.cz> References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz><8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> <3cf065b3-383b-d1ff-1a33-3cc4b5845274@img.cas.cz> Message-ID: If that has not changed, then: PATH_NAME is not usable for placement policies. Only the FILESET_NAME attribute is accepted. One might think, that PATH_NAME is as known on creating a new file as is FILESET_NAME, but for some reason the documentation says: "When file attributes are referenced in initial placement rules, only the following attributes are valid: FILESET_NAME, GROUP_ID, NAME, and USER_ID. " Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Michal Zacek To: gpfsug-discuss at spectrumscale.org Date: 12/07/2018 10:49 Subject: Re: [gpfsug-discuss] File placement rule for new files in directory Sent by: gpfsug-discuss-bounces at spectrumscale.org That's perfect, thank you both. Best regards Michal Dne 12.7.2018 v 10:39 Smita J Raut napsal(a): If ABCD is not a fileset then below rule can be used- RULE 'ABCD-rule-01' SET POOL 'fastdata' WHERE PATH_NAME LIKE '/gpfs/gpfs01/ABCD/%' Thanks, Smita From: Simon Thompson To: gpfsug main discussion list Date: 07/12/2018 01:34 PM Subject: Re: [gpfsug-discuss] File placement rule for new files in directory Sent by: gpfsug-discuss-bounces at spectrumscale.org Is ABCD a fileset? If so, its easy with something like: RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name') Simon On 12/07/2018, 07:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of zacekm at img.cas.cz" wrote: Hello, it is possible to create file placement policy for new files in one directory? I need something like this --> All new files created in directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". Thanks. Best regards, Michal _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss [attachment "smime.p7s" deleted by Uwe Falke/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From heiner.billich at psi.ch Thu Jul 12 14:30:43 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 12 Jul 2018 13:30:43 +0000 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> Message-ID: <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> Hello Sven, Thank you. I did enable numaMemorInterleave but the issues stays. In the meantime I switched to version 5.0.0-2 just to see if it?s version dependent ? it?s not. All gpfs filesystems are unmounted when this happens. At shutdown I often need to do a hard reset to force a reboot ? o.k., I never waited more than 5 minutes once I saw a hang, maybe it would recover after some more time. ?rmmod mmfs26? doesn?t hang all the times, maybe at every other shutdown or mmstartup/mmshutdown cycle. While rmmod hangs the system seems slow, command like ?ps -efH? or ?history? take a long time and some mm commands just block, a few times the system gets completely inaccessible. I?ll reinstall the systems and move back to 4.2.3-8 and see if this is a stable configuration to start from an to rule out any hardware/BIOS issues. I append output from numactl -H below. Cheers, Heiner Test with 5.0.0-2 [root at xbl-ces-2 ~]# numactl -H available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 node 0 size: 130942 MB node 0 free: 60295 MB node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 node 1 size: 131072 MB node 1 free: 60042 MB node distances: node 0 1 0: 10 21 1: 21 10 [root at xbl-ces-2 ~]# mmdiag --config | grep numaM ! numaMemoryInterleave yes # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-693.17.1.el7.x86_64 root=/dev/mapper/vg_root-lv_root ro crashkernel=auto rd.lvm.lv=vg_root/lv_root console=tty0 console=ttyS0,115200 nosmap Example output of ps -efH during mmshutdown when rmmod did hang (last line) This is with 5.0.0-2. As I see all gpfs processe already terminated, just root 1 0 0 14:30 ? 00:00:10 /usr/lib/systemd/systemd --switched-root --system --deserialize 21 root 1035 1 0 14:30 ? 00:00:02 /usr/lib/systemd/systemd-journald root 1055 1 0 14:30 ? 00:00:00 /usr/sbin/lvmetad -f root 1072 1 0 14:30 ? 00:00:11 /usr/lib/systemd/systemd-udevd root 1478 1 0 14:31 ? 00:00:00 /usr/sbin/sssd -i -f root 1484 1478 0 14:31 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain D.PSI.CH --uid 0 --gid 0 --debug-to-files root 1486 1478 0 14:31 ? 00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files root 1487 1478 0 14:31 ? 00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files root 1479 1 0 14:31 ? 00:00:00 /usr/sbin/rasdaemon -f -r root 1482 1 0 14:31 ? 00:00:04 /usr/sbin/irqbalance --foreground dbus 1483 1 0 14:31 ? 00:00:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation root 1496 1 0 14:31 ? 00:00:00 /usr/sbin/smartd -n -q never root 1498 1 0 14:31 ? 00:00:00 /usr/sbin/gssproxy -D nscd 1507 1 0 14:31 ? 00:00:01 /usr/sbin/nscd nrpe 1526 1 0 14:31 ? 00:00:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d root 1531 1 0 14:31 ? 00:00:00 /usr/lib/systemd/systemd-logind root 1533 1 0 14:31 ? 00:00:00 /usr/sbin/rpc.gssd root 1803 1 0 14:31 ttyS0 00:00:00 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220 root 1804 1 0 14:31 tty1 00:00:00 /sbin/agetty --noclear tty1 linux root 2405 1 0 14:32 ? 00:00:00 /sbin/dhclient -q -cf /etc/dhcp/dhclient-ib0.conf -lf /var/lib/dhclient/dhclient--ib0.l root 2461 1 0 14:32 ? 00:00:00 /usr/sbin/sshd -D root 11561 2461 0 14:35 ? 00:00:00 sshd: root at pts/0 root 11565 11561 0 14:35 pts/0 00:00:00 -bash root 16024 11565 0 14:50 pts/0 00:00:05 ps -efH root 11609 2461 0 14:35 ? 00:00:00 sshd: root at pts/1 root 11644 11609 0 14:35 pts/1 00:00:00 -bash root 2718 1 0 14:32 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 0 no root 2758 1 0 14:32 ? 00:00:00 /usr/libexec/postfix/master -w postfix 2785 2758 0 14:32 ? 00:00:00 pickup -l -t unix -u postfix 2786 2758 0 14:32 ? 00:00:00 qmgr -l -t unix -u root 3174 1 0 14:32 ? 00:00:00 /usr/sbin/crond -n ntp 3179 1 0 14:32 ? 00:00:00 /usr/sbin/ntpd -u ntp:ntp -g root 3915 1 3 14:32 ? 00:00:33 python /usr/lpp/mmfs/bin/mmsysmon.py root 13618 1 0 14:36 ? 00:00:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 8192 yes no root 15936 1 0 14:49 pts/1 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs root 15992 15936 0 14:49 pts/1 00:00:00 /sbin/rmmod mmfs26 -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Wednesday 11 July 2018 at 15:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown Hi, what does numactl -H report ? also check if this is set to yes : root at fab3a:~# mmlsconfig numaMemoryInterleave numaMemoryInterleave yes Sven On Wed, Jul 11, 2018 at 6:40 AM Billich Heinrich Rainer (PSI) > wrote: Hello, I have two nodes which hang on ?mmshutdown?, in detail the command ?/sbin/rmmod mmfs26? hangs. I get kernel messages which I append below. I wonder if this looks familiar to somebody? Is it a known bug? I can avoid the issue if I reduce pagepool from 128G to 64G. Running ?systemctl stop gpfs? shows the same issue. It forcefully terminates after a while, but ?rmmod? stays stuck. Two functions cxiReleaseAndForgetPages and put_page seem to be involved, the first part of gpfs, the second a kernel call. The servers have 256G memory and 72 (virtual) cores each. I run 5.0.1-1 on RHEL7.4 with kernel 3.10.0-693.17.1.el7.x86_64. I can try to switch back to 5.0.0 Thank you & kind regards, Heiner Jul 11 14:12:04 node-1.x.y mmremote[1641]: Unloading module mmfs26 Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The Spectrum Scale service process not running on this node. Normal operation cannot be done Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [I] Event raised: The Spectrum Scale service process is running Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The node is not able to form a quorum with the other available nodes. Jul 11 14:12:38 node-1.x.y sshd[2826]: Connection closed by xxx port 52814 [preauth] Jul 11 14:12:41 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [rmmod:2695] Jul 11 14:12:41 node-1.x.y kernel: Modules linked in: mmfs26(OE-) mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect Jul 11 14:12:41 node-1.x.y kernel: sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] Jul 11 14:12:41 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 Jul 11 14:12:41 node-1.x.y kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Jul 11 14:12:41 node-1.x.y kernel: task: ffff8808c4814f10 ti: ffff881619778000 task.ti: ffff881619778000 Jul 11 14:12:41 node-1.x.y kernel: RIP: 0010:[] [] put_compound_page+0xc3/0x174 Jul 11 14:12:41 node-1.x.y kernel: RSP: 0018:ffff88161977bd50 EFLAGS: 00000246 Jul 11 14:12:41 node-1.x.y kernel: RAX: 0000000000000283 RBX: 00000000fae3d201 RCX: 0000000000000284 Jul 11 14:12:41 node-1.x.y kernel: RDX: 0000000000000283 RSI: 0000000000000246 RDI: ffffea003d478000 Jul 11 14:12:41 node-1.x.y kernel: RBP: ffff88161977bd68 R08: ffff881ffae3d1e0 R09: 0000000180800059 Jul 11 14:12:41 node-1.x.y kernel: R10: 00000000fae3d201 R11: ffffea007feb8f40 R12: 00000000fae3d201 Jul 11 14:12:41 node-1.x.y kernel: R13: ffff88161977bd40 R14: 0000000000000000 R15: ffff88161977bd40 Jul 11 14:12:41 node-1.x.y kernel: FS: 00007f81a1db0740(0000) GS:ffff883ffee80000(0000) knlGS:0000000000000000 Jul 11 14:12:41 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 11 14:12:41 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: 0000000c36b2c000 CR4: 00000000001607e0 Jul 11 14:12:41 node-1.x.y kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 11 14:12:41 node-1.x.y kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 11 14:12:41 node-1.x.y kernel: Call Trace: Jul 11 14:12:41 node-1.x.y kernel: [] put_page+0x45/0x50 Jul 11 14:12:41 node-1.x.y kernel: [] cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] ? kmem_cache_free+0x1e2/0x200 Jul 11 14:12:41 node-1.x.y kernel: [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] SyS_delete_module+0x19b/0x300 Jul 11 14:12:41 node-1.x.y kernel: [] system_call_fastpath+0x16/0x1b Jul 11 14:12:41 node-1.x.y kernel: Code: d1 00 00 00 4c 89 e7 e8 3a ff ff ff e9 c4 00 00 00 4c 39 e3 74 c1 41 8b 54 24 1c 85 d2 74 b8 8d 4a 01 89 d0 f0 41 0f b1 4c 24 1c <39> c2 74 04 89 c2 eb e8 e8 f3 f0 ae ff 49 89 c5 f0 41 0f ba 2c Jul 11 14:13:23 node-1.x.y systemd[1]: gpfs.service stopping timed out. Terminating. Jul 11 14:13:27 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 21s! [rmmod:2695] Jul 11 14:13:27 node-1.x.y kernel: Modules linked in: mmfs26(OE-) mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler Jul 11 14:13:27 node-1.x.y kernel: INFO: rcu_sched detected stalls on CPUs/tasks: Jul 11 14:13:27 node-1.x.y kernel: { Jul 11 14:13:27 node-1.x.y kernel: 28 Jul 11 14:13:27 node-1.x.y kernel: } Jul 11 14:13:27 node-1.x.y kernel: (detected by 17, t=60002 jiffies, g=267734, c=267733, q=36089) Jul 11 14:13:27 node-1.x.y kernel: Task dump for CPU 28: Jul 11 14:13:27 node-1.x.y kernel: rmmod R Jul 11 14:13:27 node-1.x.y kernel: running task Jul 11 14:13:27 node-1.x.y kernel: 0 2695 2642 0x00000008 Jul 11 14:13:27 node-1.x.y kernel: Call Trace: Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? __free_slab+0xdc/0x200 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? __put_compound_page+0x22/0x22 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? put_page+0x45/0x50 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? SyS_delete_module+0x19b/0x300 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? system_call_fastpath+0x16/0x1b Jul 11 14:13:27 node-1.x.y kernel: acpi_power_meter Jul 11 14:13:27 node-1.x.y kernel: binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] Jul 11 14:13:27 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 Jul 11 14:13:27 node-1.x.y kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Jul 11 14:13:27 node-1.x.y kernel: task: ffff8808c4814f10 ti: ffff881619778000 task.ti: ffff881619778000 Jul 11 14:13:27 node-1.x.y kernel: RIP: 0010:[] [] __put_compound_page+0x22/0x22 Jul 11 14:13:27 node-1.x.y kernel: RSP: 0018:ffff88161977bd70 EFLAGS: 00000282 Jul 11 14:13:27 node-1.x.y kernel: RAX: 002fffff00008010 RBX: 0000000000000135 RCX: 00000000000001c1 Jul 11 14:13:27 node-1.x.y kernel: RDX: ffff8814adbbf000 RSI: 0000000000000246 RDI: ffffea00650e7040 Jul 11 14:13:27 node-1.x.y kernel: RBP: ffff88161977bd78 R08: ffff881ffae3df60 R09: 0000000180800052 Jul 11 14:13:27 node-1.x.y kernel: R10: 00000000fae3db01 R11: ffffea007feb8f40 R12: ffff881ffae3df60 Jul 11 14:13:27 node-1.x.y kernel: R13: 0000000180800052 R14: 00000000fae3db01 R15: ffffea007feb8f40 Jul 11 14:13:27 node-1.x.y kernel: FS: 00007f81a1db0740(0000) GS:ffff883ffee80000(0000) knlGS:0000000000000000 Jul 11 14:13:27 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 11 14:13:27 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: 0000000c36b2c000 CR4: 00000000001607e0 Jul 11 14:13:27 node-1.x.y kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 11 14:13:27 node-1.x.y kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 11 14:13:27 node-1.x.y kernel: Call Trace: Jul 11 14:13:27 node-1.x.y kernel: [] ? put_page+0x45/0x50 Jul 11 14:13:27 node-1.x.y kernel: [] cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] SyS_delete_module+0x19b/0x300 Jul 11 14:13:27 node-1.x.y kernel: [] system_call_fastpath+0x16/0x1b Jul 11 14:13:27 node-1.x.y kernel: Code: c0 0f 95 c0 0f b6 c0 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 8b 07 48 89 fb a8 20 74 05 e8 0c f8 ae ff 48 89 df ff 53 60 5b 5d c3 <0f> 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 07 48 89 fb f6 -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Jul 12 14:40:15 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 12 Jul 2018 06:40:15 -0700 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> Message-ID: if that happens it would be interesting what top reports start top in a large resolution window (like 330x80) , press shift-H , this will break it down per Thread, also press 1 to have a list of each cpu individually and see if you can either spot one core on the top list with 0% idle or on the thread list on the bottom if any of the threads run at 100% core speed. attached is a screenshot which columns to look at , this system is idle, so nothing to see, just to show you where to look does this machine by any chance has either large maxfilestochache or is a token server ? [image: image.png] sven On Thu, Jul 12, 2018 at 6:30 AM Billich Heinrich Rainer (PSI) < heiner.billich at psi.ch> wrote: > Hello Sven, > > > > Thank you. I did enable numaMemorInterleave but the issues stays. > > > > In the meantime I switched to version 5.0.0-2 just to see if it?s version > dependent ? it?s not. All gpfs filesystems are unmounted when this happens. > > > > At shutdown I often need to do a hard reset to force a reboot ? o.k., I > never waited more than 5 minutes once I saw a hang, maybe it would recover > after some more time. > > > > ?rmmod mmfs26? doesn?t hang all the times, maybe at every other shutdown > or mmstartup/mmshutdown cycle. While rmmod hangs the system seems slow, > command like ?ps -efH? or ?history? take a long time and some mm commands > just block, a few times the system gets completely inaccessible. > > > > I?ll reinstall the systems and move back to 4.2.3-8 and see if this is a > stable configuration to start from an to rule out any hardware/BIOS issues. > > > > I append output from numactl -H below. > > > > Cheers, > > > > Heiner > > > > Test with 5.0.0-2 > > > > [root at xbl-ces-2 ~]# numactl -H > > available: 2 nodes (0-1) > > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 37 38 39 40 41 > 42 43 44 45 46 47 48 49 50 51 52 53 > > node 0 size: 130942 MB > > node 0 free: 60295 MB > > node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 54 55 > 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 > > node 1 size: 131072 MB > > node 1 free: 60042 MB > > node distances: > > node 0 1 > > 0: 10 21 > > 1: 21 10 > > > > [root at xbl-ces-2 ~]# mmdiag --config | grep numaM > > ! numaMemoryInterleave yes > > > > # cat /proc/cmdline > > BOOT_IMAGE=/vmlinuz-3.10.0-693.17.1.el7.x86_64 > root=/dev/mapper/vg_root-lv_root ro crashkernel=auto rd.lvm.lv=vg_root/lv_root > console=tty0 console=ttyS0,115200 nosmap > > > > > > Example output of ps -efH during mmshutdown when rmmod did hang (last > line) This is with 5.0.0-2. As I see all gpfs processe already terminated, > just > > > > root 1 0 0 14:30 ? 00:00:10 /usr/lib/systemd/systemd > --switched-root --system --deserialize 21 > > root 1035 1 0 14:30 ? 00:00:02 > /usr/lib/systemd/systemd-journald > > root 1055 1 0 14:30 ? 00:00:00 /usr/sbin/lvmetad -f > > root 1072 1 0 14:30 ? 00:00:11 > /usr/lib/systemd/systemd-udevd > > root 1478 1 0 14:31 ? 00:00:00 /usr/sbin/sssd -i -f > > root 1484 1478 0 14:31 ? 00:00:00 > /usr/libexec/sssd/sssd_be --domain D.PSI.CH --uid 0 --gid 0 > --debug-to-files > > root 1486 1478 0 14:31 ? 00:00:00 > /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files > > root 1487 1478 0 14:31 ? 00:00:00 > /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files > > root 1479 1 0 14:31 ? 00:00:00 /usr/sbin/rasdaemon -f -r > > root 1482 1 0 14:31 ? 00:00:04 /usr/sbin/irqbalance > --foreground > > dbus 1483 1 0 14:31 ? 00:00:00 /bin/dbus-daemon > --system --address=systemd: --nofork --nopidfile --systemd-activation > > root 1496 1 0 14:31 ? 00:00:00 /usr/sbin/smartd -n -q > never > > root 1498 1 0 14:31 ? 00:00:00 /usr/sbin/gssproxy -D > > nscd 1507 1 0 14:31 ? 00:00:01 /usr/sbin/nscd > > nrpe 1526 1 0 14:31 ? 00:00:00 /usr/sbin/nrpe -c > /etc/nagios/nrpe.cfg -d > > root 1531 1 0 14:31 ? 00:00:00 > /usr/lib/systemd/systemd-logind > > root 1533 1 0 14:31 ? 00:00:00 /usr/sbin/rpc.gssd > > root 1803 1 0 14:31 ttyS0 00:00:00 /sbin/agetty --keep-baud > 115200 38400 9600 ttyS0 vt220 > > root 1804 1 0 14:31 tty1 00:00:00 /sbin/agetty --noclear > tty1 linux > > root 2405 1 0 14:32 ? 00:00:00 /sbin/dhclient -q -cf > /etc/dhcp/dhclient-ib0.conf -lf /var/lib/dhclient/dhclient--ib0.l > > root 2461 1 0 14:32 ? 00:00:00 /usr/sbin/sshd -D > > root 11561 2461 0 14:35 ? 00:00:00 sshd: root at pts/0 > > root 11565 11561 0 14:35 pts/0 00:00:00 -bash > > root 16024 11565 0 14:50 pts/0 00:00:05 ps -efH > > root 11609 2461 0 14:35 ? 00:00:00 sshd: root at pts/1 > > root 11644 11609 0 14:35 pts/1 00:00:00 -bash > > root 2718 1 0 14:32 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh > /usr/lpp/mmfs/bin/mmccrmonitor 15 0 no > > root 2758 1 0 14:32 ? 00:00:00 > /usr/libexec/postfix/master -w > > postfix 2785 2758 0 14:32 ? 00:00:00 pickup -l -t unix -u > > postfix 2786 2758 0 14:32 ? 00:00:00 qmgr -l -t unix -u > > root 3174 1 0 14:32 ? 00:00:00 /usr/sbin/crond -n > > ntp 3179 1 0 14:32 ? 00:00:00 /usr/sbin/ntpd -u > ntp:ntp -g > > root 3915 1 3 14:32 ? 00:00:33 python > /usr/lpp/mmfs/bin/mmsysmon.py > > root 13618 1 0 14:36 ? 00:00:00 > /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 8192 yes > no > > root 15936 1 0 14:49 pts/1 00:00:00 /usr/lpp/mmfs/bin/mmksh > /usr/lpp/mmfs/bin/runmmfs > > root 15992 15936 0 14:49 pts/1 00:00:00 /sbin/rmmod mmfs26 > > > > -- > > Paul Scherrer Institut > > Science IT > > Heiner Billich > > WHGA 106 > > CH 5232 Villigen PSI > > 056 310 36 02 > > https://www.psi.ch > > > > > > *From: * on behalf of Sven > Oehme > *Reply-To: *gpfsug main discussion list > *Date: *Wednesday 11 July 2018 at 15:47 > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown > > > > Hi, > > > > what does numactl -H report ? > > > > also check if this is set to yes : > > > > root at fab3a:~# mmlsconfig numaMemoryInterleave > > numaMemoryInterleave yes > > > > Sven > > > > On Wed, Jul 11, 2018 at 6:40 AM Billich Heinrich Rainer (PSI) < > heiner.billich at psi.ch> wrote: > > Hello, > > > > I have two nodes which hang on ?mmshutdown?, in detail the command > ?/sbin/rmmod mmfs26? hangs. I get kernel messages which I append below. I > wonder if this looks familiar to somebody? Is it a known bug? I can avoid > the issue if I reduce pagepool from 128G to 64G. > > > > Running ?systemctl stop gpfs? shows the same issue. It forcefully > terminates after a while, but ?rmmod? stays stuck. > > > > Two functions cxiReleaseAndForgetPages and put_page seem to be involved, > the first part of gpfs, the second a kernel call. > > > > The servers have 256G memory and 72 (virtual) cores each. > > I run 5.0.1-1 on RHEL7.4 with kernel 3.10.0-693.17.1.el7.x86_64. > > > > I can try to switch back to 5.0.0 > > > > Thank you & kind regards, > > > > Heiner > > > > > > > > Jul 11 14:12:04 node-1.x.y mmremote[1641]: Unloading module mmfs26 > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The Spectrum > Scale service process not running on this node. Normal operation cannot be > done > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [I] Event raised: The Spectrum > Scale service process is running > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The node is > not able to form a quorum with the other available nodes. > > Jul 11 14:12:38 node-1.x.y sshd[2826]: Connection closed by xxx port 52814 > [preauth] > > > > Jul 11 14:12:41 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 > stuck for 23s! [rmmod:2695] > > > > Jul 11 14:12:41 node-1.x.y kernel: Modules linked in: mmfs26(OE-) > mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) > rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) > mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) > mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp > coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt > pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler > acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc > ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 > i2c_algo_bit drm_kms_helper syscopyarea sysfillrect > > Jul 11 14:12:41 node-1.x.y kernel: sysimgblt fb_sys_fops ttm ixgbe > mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp > crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca > dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] > > Jul 11 14:12:41 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: > G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > Jul 11 14:12:41 node-1.x.y kernel: Hardware name: HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Jul 11 14:12:41 node-1.x.y kernel: task: ffff8808c4814f10 ti: > ffff881619778000 task.ti: ffff881619778000 > > Jul 11 14:12:41 node-1.x.y kernel: RIP: 0010:[] > [] put_compound_page+0xc3/0x174 > > Jul 11 14:12:41 node-1.x.y kernel: RSP: 0018:ffff88161977bd50 EFLAGS: > 00000246 > > Jul 11 14:12:41 node-1.x.y kernel: RAX: 0000000000000283 RBX: > 00000000fae3d201 RCX: 0000000000000284 > > Jul 11 14:12:41 node-1.x.y kernel: RDX: 0000000000000283 RSI: > 0000000000000246 RDI: ffffea003d478000 > > Jul 11 14:12:41 node-1.x.y kernel: RBP: ffff88161977bd68 R08: > ffff881ffae3d1e0 R09: 0000000180800059 > > Jul 11 14:12:41 node-1.x.y kernel: R10: 00000000fae3d201 R11: > ffffea007feb8f40 R12: 00000000fae3d201 > > Jul 11 14:12:41 node-1.x.y kernel: R13: ffff88161977bd40 R14: > 0000000000000000 R15: ffff88161977bd40 > > Jul 11 14:12:41 node-1.x.y kernel: FS: 00007f81a1db0740(0000) > GS:ffff883ffee80000(0000) knlGS:0000000000000000 > > Jul 11 14:12:41 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Jul 11 14:12:41 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: > 0000000c36b2c000 CR4: 00000000001607e0 > > Jul 11 14:12:41 node-1.x.y kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Jul 11 14:12:41 node-1.x.y kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > > > > Jul 11 14:12:41 node-1.x.y kernel: Call Trace: > > Jul 11 14:12:41 node-1.x.y kernel: [] put_page+0x45/0x50 > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] ? > kmem_cache_free+0x1e2/0x200 > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] mmfs+0xc85/0xca0 > [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] > SyS_delete_module+0x19b/0x300 > > Jul 11 14:12:41 node-1.x.y kernel: [] > system_call_fastpath+0x16/0x1b > > Jul 11 14:12:41 node-1.x.y kernel: Code: d1 00 00 00 4c 89 e7 e8 3a ff ff > ff e9 c4 00 00 00 4c 39 e3 74 c1 41 8b 54 24 1c 85 d2 74 b8 8d 4a 01 89 d0 > f0 41 0f b1 4c 24 1c <39> c2 74 04 89 c2 eb e8 e8 f3 f0 ae ff 49 89 c5 f0 > 41 0f ba 2c > > > > Jul 11 14:13:23 node-1.x.y systemd[1]: gpfs.service stopping timed out. > Terminating. > > > > Jul 11 14:13:27 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 > stuck for 21s! [rmmod:2695] > > > > Jul 11 14:13:27 node-1.x.y kernel: Modules linked in: mmfs26(OE-) > mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) > rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) > mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) > mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp > coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt > pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler > > Jul 11 14:13:27 node-1.x.y kernel: INFO: rcu_sched detected stalls on > CPUs/tasks: > > Jul 11 14:13:27 node-1.x.y kernel: { > > Jul 11 14:13:27 node-1.x.y kernel: 28 > > Jul 11 14:13:27 node-1.x.y kernel: } > > Jul 11 14:13:27 node-1.x.y kernel: (detected by 17, t=60002 jiffies, > g=267734, c=267733, q=36089) > > Jul 11 14:13:27 node-1.x.y kernel: Task dump for CPU 28: > > Jul 11 14:13:27 node-1.x.y kernel: rmmod R > > Jul 11 14:13:27 node-1.x.y kernel: running task > > Jul 11 14:13:27 node-1.x.y kernel: 0 2695 2642 0x00000008 > > Jul 11 14:13:27 node-1.x.y kernel: Call Trace: > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > __free_slab+0xdc/0x200 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > __put_compound_page+0x22/0x22 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > put_page+0x45/0x50 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > mmfs+0xc85/0xca0 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > SyS_delete_module+0x19b/0x300 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > system_call_fastpath+0x16/0x1b > > Jul 11 14:13:27 node-1.x.y kernel: acpi_power_meter > > Jul 11 14:13:27 node-1.x.y kernel: binfmt_misc nfsd auth_rpcgss nfs_acl > lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif > crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul > mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa > pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log > dm_mod [last unloaded: tracedev] > > Jul 11 14:13:27 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: > G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > Jul 11 14:13:27 node-1.x.y kernel: Hardware name: HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Jul 11 14:13:27 node-1.x.y kernel: task: ffff8808c4814f10 ti: > ffff881619778000 task.ti: ffff881619778000 > > Jul 11 14:13:27 node-1.x.y kernel: RIP: 0010:[] > [] __put_compound_page+0x22/0x22 > > Jul 11 14:13:27 node-1.x.y kernel: RSP: 0018:ffff88161977bd70 EFLAGS: > 00000282 > > Jul 11 14:13:27 node-1.x.y kernel: RAX: 002fffff00008010 RBX: > 0000000000000135 RCX: 00000000000001c1 > > Jul 11 14:13:27 node-1.x.y kernel: RDX: ffff8814adbbf000 RSI: > 0000000000000246 RDI: ffffea00650e7040 > > Jul 11 14:13:27 node-1.x.y kernel: RBP: ffff88161977bd78 R08: > ffff881ffae3df60 R09: 0000000180800052 > > Jul 11 14:13:27 node-1.x.y kernel: R10: 00000000fae3db01 R11: > ffffea007feb8f40 R12: ffff881ffae3df60 > > Jul 11 14:13:27 node-1.x.y kernel: R13: 0000000180800052 R14: > 00000000fae3db01 R15: ffffea007feb8f40 > > Jul 11 14:13:27 node-1.x.y kernel: FS: 00007f81a1db0740(0000) > GS:ffff883ffee80000(0000) knlGS:0000000000000000 > > Jul 11 14:13:27 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Jul 11 14:13:27 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: > 0000000c36b2c000 CR4: 00000000001607e0 > > Jul 11 14:13:27 node-1.x.y kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Jul 11 14:13:27 node-1.x.y kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > > Jul 11 14:13:27 node-1.x.y kernel: Call Trace: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > put_page+0x45/0x50 > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] mmfs+0xc85/0xca0 > [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] > SyS_delete_module+0x19b/0x300 > > Jul 11 14:13:27 node-1.x.y kernel: [] > system_call_fastpath+0x16/0x1b > > Jul 11 14:13:27 node-1.x.y kernel: Code: c0 0f 95 c0 0f b6 c0 5d c3 0f 1f > 44 00 00 55 48 89 e5 53 48 8b 07 48 89 fb a8 20 74 05 e8 0c f8 ae ff 48 89 > df ff 53 60 5b 5d c3 <0f> 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 07 > 48 89 fb f6 > > > > -- > > Paul Scherrer Institut > > Science IT > > Heiner Billich > > WHGA 106 > > CH 5232 Villigen PSI > > 056 310 36 02 > > https://www.psi.ch > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 643176 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Jul 12 15:47:00 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 12 Jul 2018 10:47:00 -0400 Subject: [gpfsug-discuss] File placement rule for new files in directory - PATH_NAME In-Reply-To: References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz><8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk><3cf065b3-383b-d1ff-1a33-3cc4b5845274@img.cas.cz> Message-ID: Why no path name in SET POOL rule? Maybe more than one reason, but consider, that in Unix, the API has the concept of "current directory" and "create a file in the current directory" AND another process or thread may at any time rename (mv!) any directory... So even it you "think" you know the name of the directory in which you are creating a file, you really don't know for sure! So, you may ask, how does the command /bin/pwd work? It follows the parent inode field of each inode, searches the parent for a matching inode, stashes the name in a buffer... When it reaches the root, it prints out the apparent path it found to the root... Which could be wrong by the time it reaches the root! For example: [root@~/gpfs-git]$mkdir -p /tmp/a/b/c/d [root@~/gpfs-git]$cd /tmp/a/b/c/d [root at .../c/d]$/bin/pwd /tmp/a/b/c/d [root at .../c/d]$pwd /tmp/a/b/c/d [root at .../c/d]$mv /tmp/a/b /tmp/a/b2 [root at .../c/d]$pwd /tmp/a/b/c/d # Bash still "thinks" it is in /tmp/a/b/c/d [root at .../c/d]$/bin/pwd /tmp/a/b2/c/d # But /bin/pwd knows better -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Thu Jul 12 16:21:50 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 12 Jul 2018 15:21:50 +0000 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> Message-ID: <68AA932A-5A53-4EA8-879E-A843783DF0F4@psi.ch> Hello Sven, The machine has maxFilesToCache 204800 (2M) it will become a CES node, hence the higher than default value. It?s just a 3 node cluster with remote cluster mount and no activity (yet). But all three nodes are listed as token server by ?mmdiag ?tokenmgr?. Top showed 100% idle on core 55. This matches the kernel messages about rmmod being stuck on core 55. I didn?t see a dominating thread/process, but many kernel threads showed 30-40% CPU, in sum that used about 50% of all cpu available. This time mmshutdown did return and left the module loaded, next mmstartup tried to remove the ?old? module and got stuck :-( I append two links to screenshots Thank you, Heiner https://pasteboard.co/Hu86DKf.png https://pasteboard.co/Hu86rg4.png If the links don?t work I can post the images to the list. Kernel messages: [ 857.791050] CPU: 55 PID: 16429 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 [ 857.842265] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 [ 857.884938] task: ffff883ffafe8fd0 ti: ffff88342af30000 task.ti: ffff88342af30000 [ 857.924120] RIP: 0010:[] [] compound_unlock_irqrestore+0xe/0x20 [ 857.970708] RSP: 0018:ffff88342af33d38 EFLAGS: 00000246 [ 857.999742] RAX: 0000000000000000 RBX: ffff88207ffda068 RCX: 00000000000000e5 [ 858.037165] RDX: 0000000000000246 RSI: 0000000000000246 RDI: 0000000000000246 [ 858.074416] RBP: ffff88342af33d38 R08: 0000000000000000 R09: 0000000000000000 [ 858.111519] R10: ffff88207ffcfac0 R11: ffffea00fff40280 R12: 0000000000000200 [ 858.148421] R13: 00000001fff40280 R14: ffffffff8118cd84 R15: ffff88342af33ce8 [ 858.185845] FS: 00007fc797d1e740(0000) GS:ffff883fff0c0000(0000) knlGS:0000000000000000 [ 858.227062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 858.257819] CR2: 00000000004116d0 CR3: 0000003fc2ec0000 CR4: 00000000001607e0 [ 858.295143] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 858.332145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 858.369097] Call Trace: [ 858.384829] [] put_compound_page+0x149/0x174 [ 858.416176] [] put_page+0x45/0x50 [ 858.443185] [] cxiReleaseAndForgetPages+0xda/0x220 [mmfslinux] [ 858.481751] [] ? cxiDeallocPageList+0xbd/0x110 [mmfslinux] [ 858.518206] [] cxiDeallocPageList+0x45/0x110 [mmfslinux] [ 858.554438] [] ? _raw_spin_lock+0x10/0x30 [ 858.585522] [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] [ 858.622670] [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] [ 858.659246] [] mmfs+0xc85/0xca0 [mmfs26] [ 858.689379] [] gpfs_clean+0x26/0x30 [mmfslinux] [ 858.722330] [] cleanup_module+0x25/0x30 [mmfs26] [ 858.755431] [] SyS_delete_module+0x19b/0x300 [ 858.786882] [] system_call_fastpath+0x16/0x1b [ 858.818776] Code: 89 ca 44 89 c1 4c 8d 43 10 e8 6f 2b ff ff 89 c2 48 89 13 5b 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 f0 80 67 03 fe 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 [ 859.068528] hrtimer: interrupt took 2877171 ns [ 870.517924] INFO: rcu_sched self-detected stall on CPU { 55} (t=240003 jiffies g=18437 c=18436 q=194992) [ 870.577882] Task dump for CPU 55: [ 870.602837] rmmod R running task 0 16429 16374 0x00000008 [ 870.645206] Call Trace: [ 870.666388] [] sched_show_task+0xa8/0x110 [ 870.704271] [] dump_cpu_task+0x39/0x70 [ 870.738421] [] rcu_dump_cpu_stacks+0x90/0xd0 [ 870.775339] [] rcu_check_callbacks+0x442/0x730 [ 870.812353] [] ? tick_sched_do_timer+0x50/0x50 [ 870.848875] [] update_process_times+0x46/0x80 [ 870.884847] [] tick_sched_handle+0x30/0x70 [ 870.919740] [] tick_sched_timer+0x39/0x80 [ 870.953660] [] __hrtimer_run_queues+0xd4/0x260 [ 870.989276] [] hrtimer_interrupt+0xaf/0x1d0 [ 871.023481] [] local_apic_timer_interrupt+0x35/0x60 [ 871.061233] [] smp_apic_timer_interrupt+0x3d/0x50 [ 871.097838] [] apic_timer_interrupt+0x232/0x240 [ 871.133232] [] ? put_page_testzero+0x8/0x15 [ 871.170089] [] put_compound_page+0x151/0x174 [ 871.204221] [] put_page+0x45/0x50 [ 871.234554] [] cxiReleaseAndForgetPages+0xda/0x220 [mmfslinux] [ 871.275763] [] ? cxiDeallocPageList+0xbd/0x110 [mmfslinux] [ 871.316987] [] cxiDeallocPageList+0x45/0x110 [mmfslinux] [ 871.356886] [] ? _raw_spin_lock+0x10/0x30 [ 871.389455] [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] [ 871.429784] [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] [ 871.468753] [] mmfs+0xc85/0xca0 [mmfs26] [ 871.501196] [] gpfs_clean+0x26/0x30 [mmfslinux] [ 871.536562] [] cleanup_module+0x25/0x30 [mmfs26] [ 871.572110] [] SyS_delete_module+0x19b/0x300 [ 871.606048] [] system_call_fastpath+0x16/0x1b -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Thursday 12 July 2018 at 15:42 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown if that happens it would be interesting what top reports start top in a large resolution window (like 330x80) , press shift-H , this will break it down per Thread, also press 1 to have a list of each cpu individually and see if you can either spot one core on the top list with 0% idle or on the thread list on the bottom if any of the threads run at 100% core speed. attached is a screenshot which columns to look at , this system is idle, so nothing to see, just to show you where to look does this machine by any chance has either large maxfilestochache or is a token server ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Jul 12 16:30:43 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 12 Jul 2018 08:30:43 -0700 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: <68AA932A-5A53-4EA8-879E-A843783DF0F4@psi.ch> References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> <68AA932A-5A53-4EA8-879E-A843783DF0F4@psi.ch> Message-ID: Hi, the problem is the cleanup of the tokens and/or the openfile objects. i suggest you open a defect for this. sven On Thu, Jul 12, 2018 at 8:22 AM Billich Heinrich Rainer (PSI) < heiner.billich at psi.ch> wrote: > > > > > Hello Sven, > > > > The machine has > > > > maxFilesToCache 204800 (2M) > > > > it will become a CES node, hence the higher than default value. It?s just > a 3 node cluster with remote cluster mount and no activity (yet). But all > three nodes are listed as token server by ?mmdiag ?tokenmgr?. > > > > Top showed 100% idle on core 55. This matches the kernel messages about > rmmod being stuck on core 55. > > I didn?t see a dominating thread/process, but many kernel threads showed > 30-40% CPU, in sum that used about 50% of all cpu available. > > > > This time mmshutdown did return and left the module loaded, next mmstartup > tried to remove the ?old? module and got stuck :-( > > > > I append two links to screenshots > > > > Thank you, > > > > Heiner > > > > https://pasteboard.co/Hu86DKf.png > > https://pasteboard.co/Hu86rg4.png > > > > If the links don?t work I can post the images to the list. > > > > Kernel messages: > > > > [ 857.791050] CPU: 55 PID: 16429 Comm: rmmod Tainted: G W OEL > ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > [ 857.842265] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, > BIOS P89 01/22/2018 > > [ 857.884938] task: ffff883ffafe8fd0 ti: ffff88342af30000 task.ti: > ffff88342af30000 > > [ 857.924120] RIP: 0010:[] [] > compound_unlock_irqrestore+0xe/0x20 > > [ 857.970708] RSP: 0018:ffff88342af33d38 EFLAGS: 00000246 > > [ 857.999742] RAX: 0000000000000000 RBX: ffff88207ffda068 RCX: > 00000000000000e5 > > [ 858.037165] RDX: 0000000000000246 RSI: 0000000000000246 RDI: > 0000000000000246 > > [ 858.074416] RBP: ffff88342af33d38 R08: 0000000000000000 R09: > 0000000000000000 > > [ 858.111519] R10: ffff88207ffcfac0 R11: ffffea00fff40280 R12: > 0000000000000200 > > [ 858.148421] R13: 00000001fff40280 R14: ffffffff8118cd84 R15: > ffff88342af33ce8 > > [ 858.185845] FS: 00007fc797d1e740(0000) GS:ffff883fff0c0000(0000) > knlGS:0000000000000000 > > [ 858.227062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 858.257819] CR2: 00000000004116d0 CR3: 0000003fc2ec0000 CR4: > 00000000001607e0 > > [ 858.295143] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > > [ 858.332145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > > [ 858.369097] Call Trace: > > [ 858.384829] [] put_compound_page+0x149/0x174 > > [ 858.416176] [] put_page+0x45/0x50 > > [ 858.443185] [] cxiReleaseAndForgetPages+0xda/0x220 > [mmfslinux] > > [ 858.481751] [] ? cxiDeallocPageList+0xbd/0x110 > [mmfslinux] > > [ 858.518206] [] cxiDeallocPageList+0x45/0x110 > [mmfslinux] > > [ 858.554438] [] ? _raw_spin_lock+0x10/0x30 > > [ 858.585522] [] cxiFreeSharedMemory+0x12a/0x130 > [mmfslinux] > > [ 858.622670] [] kxFreeAllSharedMemory+0xe2/0x160 > [mmfs26] > > [ 858.659246] [] mmfs+0xc85/0xca0 [mmfs26] > > [ 858.689379] [] gpfs_clean+0x26/0x30 [mmfslinux] > > [ 858.722330] [] cleanup_module+0x25/0x30 [mmfs26] > > [ 858.755431] [] SyS_delete_module+0x19b/0x300 > > [ 858.786882] [] system_call_fastpath+0x16/0x1b > > [ 858.818776] Code: 89 ca 44 89 c1 4c 8d 43 10 e8 6f 2b ff ff 89 c2 48 89 > 13 5b 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 f0 80 67 03 fe 48 89 f7 57 9d > <0f> 1f 44 00 00 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 > > [ 859.068528] hrtimer: interrupt took 2877171 ns > > [ 870.517924] INFO: rcu_sched self-detected stall on CPU { 55} (t=240003 > jiffies g=18437 c=18436 q=194992) > > [ 870.577882] Task dump for CPU 55: > > [ 870.602837] rmmod R running task 0 16429 16374 > 0x00000008 > > [ 870.645206] Call Trace: > > [ 870.666388] [] sched_show_task+0xa8/0x110 > > [ 870.704271] [] dump_cpu_task+0x39/0x70 > > [ 870.738421] [] rcu_dump_cpu_stacks+0x90/0xd0 > > [ 870.775339] [] rcu_check_callbacks+0x442/0x730 > > [ 870.812353] [] ? tick_sched_do_timer+0x50/0x50 > > [ 870.848875] [] update_process_times+0x46/0x80 > > [ 870.884847] [] tick_sched_handle+0x30/0x70 > > [ 870.919740] [] tick_sched_timer+0x39/0x80 > > [ 870.953660] [] __hrtimer_run_queues+0xd4/0x260 > > [ 870.989276] [] hrtimer_interrupt+0xaf/0x1d0 > > [ 871.023481] [] local_apic_timer_interrupt+0x35/0x60 > > [ 871.061233] [] smp_apic_timer_interrupt+0x3d/0x50 > > [ 871.097838] [] apic_timer_interrupt+0x232/0x240 > > [ 871.133232] [] ? put_page_testzero+0x8/0x15 > > [ 871.170089] [] put_compound_page+0x151/0x174 > > [ 871.204221] [] put_page+0x45/0x50 > > [ 871.234554] [] cxiReleaseAndForgetPages+0xda/0x220 > [mmfslinux] > > [ 871.275763] [] ? cxiDeallocPageList+0xbd/0x110 > [mmfslinux] > > [ 871.316987] [] cxiDeallocPageList+0x45/0x110 > [mmfslinux] > > [ 871.356886] [] ? _raw_spin_lock+0x10/0x30 > > [ 871.389455] [] cxiFreeSharedMemory+0x12a/0x130 > [mmfslinux] > > [ 871.429784] [] kxFreeAllSharedMemory+0xe2/0x160 > [mmfs26] > > [ 871.468753] [] mmfs+0xc85/0xca0 [mmfs26] > > [ 871.501196] [] gpfs_clean+0x26/0x30 [mmfslinux] > > [ 871.536562] [] cleanup_module+0x25/0x30 [mmfs26] > > [ 871.572110] [] SyS_delete_module+0x19b/0x300 > > [ 871.606048] [] system_call_fastpath+0x16/0x1b > > > > -- > > Paul Scherrer Institut > > Science IT > > Heiner Billich > > WHGA 106 > > CH 5232 Villigen PSI > > 056 310 36 02 > > https://www.psi.ch > > > > > > *From: * on behalf of Sven > Oehme > > > *Reply-To: *gpfsug main discussion list > > *Date: *Thursday 12 July 2018 at 15:42 > > > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown > > > > if that happens it would be interesting what top reports > > > > start top in a large resolution window (like 330x80) , press shift-H , > this will break it down per Thread, also press 1 to have a list of each cpu > individually and see if you can either spot one core on the top list with > 0% idle or on the thread list on the bottom if any of the threads run at > 100% core speed. > > attached is a screenshot which columns to look at , this system is idle, > so nothing to see, just to show you where to look > > > > does this machine by any chance has either large maxfilestochache or is a > token server ? > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Fri Jul 13 11:07:25 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Fri, 13 Jul 2018 10:07:25 +0000 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data Message-ID: <83A6EEB0EC738F459A39439733AE80452672ADC8@MBX114.d.ethz.ch> Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes. I've the following perfmon configuration for the metric-group GPFSNSDDisk: { name = "GPFSNSDDisk" period = 2 restrict = "nsdNodes" }, that, as far as I know sends data to the collector every 2 seconds (correct ?). But how ? does it send what it reads from the counter every two seconds ? or does it aggregated in some way ? or what else ? In the collector node pmcollector, grafana-bridge and grafana-server run. Now I need to understand how to play with the grafana parameters: - Down sample (or Disable downsampling) - Aggregator (following on the same row the metrics). See attached picture 4s.png as reference. In the past I had the period set to 1. And grafana used to display correct data (bytes/s for the metric gpfs_nsdds_bytes_written) with aggregator set to "sum", which AFAIK means "sum all that metrics that match the filter below" (again see the attached picture to see how the filter is set to only collect data from the IO nodes). Today I've changed to "period=2"... and grafana started to display funny data rate (the double, or quad of the real rate). I had to play (almost randomly) with "Aggregator" (from sum to avg, which as fas as I undestand doesn't mean anything in my case... average between the two IO nodes ? or what ?) and "Down sample" (from empty to 2s, and then to 4s) to get back real data rate which is compliant with what I do get with dstat. Can someone kindly explain how to play with these parameters when zimon sensor's period is changed ? Many thanks in advance Regards, Alvise Dorigo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 4s.png Type: image/png Size: 129914 bytes Desc: 4s.png URL: From Kevin.Buterbaugh at Vanderbilt.Edu Sun Jul 15 18:24:43 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Sun, 15 Jul 2018 17:24:43 +0000 Subject: [gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace? Message-ID: Hi All, We are in a partial cluster downtime today to do firmware upgrades on our storage arrays. It is a partial downtime because we have two GPFS filesystems: 1. gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which I?ve unmounted across the cluster because it has data replication set to 1. 2. gpfs22 - 42 TB and which corresponds to /home. It has data replication set to two, so what we?re doing is ?mmchdisk gpfs22 suspend -d ?, then doing the firmware upgrade, and once the array is back we?re doing a ?mmchdisk gpfs22 resume -d ?, followed by ?mmchdisk gpfs22 start -d ?. On the 1st storage array this went very smoothly ? the mmchdisk took about 5 minutes, which is what I would expect. But on the 2nd storage array the mmchdisk appears to either be hung or proceeding at a glacial pace. For more than an hour it?s been stuck at: mmchdisk: Processing continues ... Scanning file system metadata, phase 1 ? There are no waiters of any significance and ?mmdiag ?iohist? doesn?t show any issues either. Any ideas, anyone? Unless I can figure this out I?m hosed for this downtime, as I?ve got 7 more arrays to do after this one! Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sun Jul 15 18:34:45 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Sun, 15 Jul 2018 17:34:45 +0000 Subject: [gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace? In-Reply-To: References: Message-ID: <9B63AEFC-0A19-4FA2-B04B-FCB066B7C9BD@nasa.gov> Hmm...have you dumped waiters across the entire cluster or just on the NSD servers/fs managers? Maybe there?s a slow node out there participating in the suspend effort? Might be worth running some quick tracing on the FS manager to see what it?s up to. On July 15, 2018 at 13:27:54 EDT, Buterbaugh, Kevin L wrote: Hi All, We are in a partial cluster downtime today to do firmware upgrades on our storage arrays. It is a partial downtime because we have two GPFS filesystems: 1. gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which I?ve unmounted across the cluster because it has data replication set to 1. 2. gpfs22 - 42 TB and which corresponds to /home. It has data replication set to two, so what we?re doing is ?mmchdisk gpfs22 suspend -d ?, then doing the firmware upgrade, and once the array is back we?re doing a ?mmchdisk gpfs22 resume -d ?, followed by ?mmchdisk gpfs22 start -d ?. On the 1st storage array this went very smoothly ? the mmchdisk took about 5 minutes, which is what I would expect. But on the 2nd storage array the mmchdisk appears to either be hung or proceeding at a glacial pace. For more than an hour it?s been stuck at: mmchdisk: Processing continues ... Scanning file system metadata, phase 1 ? There are no waiters of any significance and ?mmdiag ?iohist? doesn?t show any issues either. Any ideas, anyone? Unless I can figure this out I?m hosed for this downtime, as I?ve got 7 more arrays to do after this one! Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Sun Jul 15 20:11:26 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Sun, 15 Jul 2018 19:11:26 +0000 Subject: [gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace? In-Reply-To: <9B63AEFC-0A19-4FA2-B04B-FCB066B7C9BD@nasa.gov> References: <9B63AEFC-0A19-4FA2-B04B-FCB066B7C9BD@nasa.gov> Message-ID: <08D6C49B-298F-4DAA-8FF3-BDAA6D9CE8FE@vanderbilt.edu> Hi All, So I had noticed some waiters on my NSD servers that I thought were unrelated to the mmchdisk. However, I decided to try rebooting my NSD servers one at a time (mmshutdown failed!) to clear that up ? and evidently one of them had things hung up because the mmchdisk start completed. Thanks? Kevin On Jul 15, 2018, at 12:34 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] > wrote: Hmm...have you dumped waiters across the entire cluster or just on the NSD servers/fs managers? Maybe there?s a slow node out there participating in the suspend effort? Might be worth running some quick tracing on the FS manager to see what it?s up to. On July 15, 2018 at 13:27:54 EDT, Buterbaugh, Kevin L > wrote: Hi All, We are in a partial cluster downtime today to do firmware upgrades on our storage arrays. It is a partial downtime because we have two GPFS filesystems: 1. gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which I?ve unmounted across the cluster because it has data replication set to 1. 2. gpfs22 - 42 TB and which corresponds to /home. It has data replication set to two, so what we?re doing is ?mmchdisk gpfs22 suspend -d ?, then doing the firmware upgrade, and once the array is back we?re doing a ?mmchdisk gpfs22 resume -d ?, followed by ?mmchdisk gpfs22 start -d ?. On the 1st storage array this went very smoothly ? the mmchdisk took about 5 minutes, which is what I would expect. But on the 2nd storage array the mmchdisk appears to either be hung or proceeding at a glacial pace. For more than an hour it?s been stuck at: mmchdisk: Processing continues ... Scanning file system metadata, phase 1 ? There are no waiters of any significance and ?mmdiag ?iohist? doesn?t show any issues either. Any ideas, anyone? Unless I can figure this out I?m hosed for this downtime, as I?ve got 7 more arrays to do after this one! Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd518db52846a4be34e2208d5ea7a00d7%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636672732087040757&sdata=m77IpWNOlODc%2FzLiYI2qiPo9Azs8qsIdXSY8%2FoC6Nn0%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Peinkofer at lrz.de Thu Jul 19 15:05:39 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Thu, 19 Jul 2018 14:05:39 +0000 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster Message-ID: <03AEBEA7-B319-4DB1-A0D8-4A250038F8E7@lrz.de> Dear GPFS List, does anyone of you know, if it is possible to have multiple file systems in a GPFS Cluster that all are served primary via Ethernet but for which different ?booster? connections to various IB/OPA fabrics exist. For example let?s say in my central Storage/NSD Cluster, I implement two file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is served by NSD-C and NSD-D. Now I have two client Clusters C1 and C2 which have different OPA fabrics. Both Clusters can mount the two file systems via Ethernet, but I now add OPA connections for NSD-A and NSD-B to C1?s fabric and OPA connections for NSD-C and NSD-D to C2?s fabric and just switch on RDMA. As far as I understood, GPFS will use RDMA if it is available between two nodes but switch to Ethernet if RDMA is not available between the two nodes. So given just this, the above scenario could work in principle. But will it work in reality and will it be supported by IBM? Many thanks in advance. Best Regards, Stephan Peinkofer -- Stephan Peinkofer Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen URL: http://www.lrz.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Thu Jul 19 15:23:42 2018 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 19 Jul 2018 10:23:42 -0400 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster In-Reply-To: <03AEBEA7-B319-4DB1-A0D8-4A250038F8E7@lrz.de> References: <03AEBEA7-B319-4DB1-A0D8-4A250038F8E7@lrz.de> Message-ID: Hi Stephan: I think every node in C1 and in C2 have to see every node in the server cluster NSD-[AD]. We have a 10 node server cluster where 2 nodes do nothing but server out nfs. Since these two are apart of the server cluster...client clusters wanting to mount the server cluster via gpfs need to see them. I think both OPA fabfics need to be on all 4 of your server nodes. Eric On Thu, Jul 19, 2018 at 10:05 AM, Peinkofer, Stephan < Stephan.Peinkofer at lrz.de> wrote: > Dear GPFS List, > > does anyone of you know, if it is possible to have multiple file systems > in a GPFS Cluster that all are served primary via Ethernet but for which > different ?booster? connections to various IB/OPA fabrics exist. > > For example let?s say in my central Storage/NSD Cluster, I implement two > file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is > served by NSD-C and NSD-D. > Now I have two client Clusters C1 and C2 which have different OPA fabrics. > Both Clusters can mount the two file systems via Ethernet, but I now add > OPA connections for NSD-A and NSD-B to C1?s fabric and OPA connections for > NSD-C and NSD-D to C2?s fabric and just switch on RDMA. > As far as I understood, GPFS will use RDMA if it is available between two > nodes but switch to Ethernet if RDMA is not available between the two > nodes. So given just this, the above scenario could work in principle. But > will it work in reality and will it be supported by IBM? > > Many thanks in advance. > Best Regards, > Stephan Peinkofer > -- > Stephan Peinkofer > Leibniz Supercomputing Centre > Data and Storage Division > Boltzmannstra?e 1, 85748 Garching b. M?nchen > URL: http://www.lrz.de > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 19 16:42:48 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 19 Jul 2018 15:42:48 +0000 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster Message-ID: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> I think what you want is to use fabric numbers with verbsPorts, e.g. we have two IB fabrics and in the config we do thinks like: [nodeclass1] verbsPorts mlx4_0/1/1 [nodeclass2] verbsPorts mlx5_0/1/3 GPFS recognises the /1 or /3 at the end as a fabric number and knows they are separate and will Ethernet between those nodes instead. Simon From: on behalf of "Stephan.Peinkofer at lrz.de" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 19 July 2018 at 15:13 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster Dear GPFS List, does anyone of you know, if it is possible to have multiple file systems in a GPFS Cluster that all are served primary via Ethernet but for which different ?booster? connections to various IB/OPA fabrics exist. For example let?s say in my central Storage/NSD Cluster, I implement two file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is served by NSD-C and NSD-D. Now I have two client Clusters C1 and C2 which have different OPA fabrics. Both Clusters can mount the two file systems via Ethernet, but I now add OPA connections for NSD-A and NSD-B to C1?s fabric and OPA connections for NSD-C and NSD-D to C2?s fabric and just switch on RDMA. As far as I understood, GPFS will use RDMA if it is available between two nodes but switch to Ethernet if RDMA is not available between the two nodes. So given just this, the above scenario could work in principle. But will it work in reality and will it be supported by IBM? Many thanks in advance. Best Regards, Stephan Peinkofer -- Stephan Peinkofer Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen URL: http://www.lrz.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Jul 19 17:54:22 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 19 Jul 2018 12:54:22 -0400 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster In-Reply-To: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> References: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> Message-ID: To add to the excellent advice others have already provided, I think you have fundamentally 2 choices: - Establish additional OPA connections from NSD-A and NSD-B to cluster C2 and from NSD-C and NSD-D to cluster C1 *or* - Add NSD-A and NSD-B as nsd servers for the NSDs for FS2 and add NSD-C and NSD-D as nsd servers for the NSDs for FS1. (Note: If you're running Scale 5.0 you can change the NSD server list with the FS available and mounted, else you'll need an outage to unmount the FS and change the NSD server list.) It's a matter of what's preferable (aasier, cheaper, etc.)-- adding OPA connections to the NSD servers or adding additional LUN presentations (which may involve SAN connections, of course) to the NSD servers. In our environment we do the latter and it works very well for us. -Aaron On 7/19/18 11:42 AM, Simon Thompson wrote: > I think what you want is to use fabric numbers with verbsPorts, e.g. we > have two IB fabrics and in the config we do thinks like: > > [nodeclass1] > > verbsPorts mlx4_0/1/1 > > [nodeclass2] > > verbsPorts mlx5_0/1/3 > > GPFS recognises the /1 or /3 at the end as a fabric number and knows > they are separate and will Ethernet between those nodes instead. > > Simon > > *From: * on behalf of > "Stephan.Peinkofer at lrz.de" > *Reply-To: *"gpfsug-discuss at spectrumscale.org" > > *Date: *Thursday, 19 July 2018 at 15:13 > *To: *"gpfsug-discuss at spectrumscale.org" > *Subject: *[gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD > Cluster > > Dear GPFS List, > > does anyone of you know, if it is possible to have multiple file systems > in a GPFS Cluster that all are served primary via Ethernet but for which > different ?booster? connections to various IB/OPA fabrics exist. > > For example let?s say in my central Storage/NSD Cluster, I implement two > file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is > served by NSD-C and NSD-D. > > Now I have two client Clusters C1 and C2 which have different OPA > fabrics. Both Clusters can mount the two file systems via Ethernet, but > I now add OPA connections for NSD-A and NSD-B to C1?s fabric and OPA > connections for NSD-C and NSD-D to ?C2?s fabric and just switch on RDMA. > > As far as I understood, GPFS will use RDMA if it is available between > two nodes but switch to Ethernet if RDMA is not available between the > two nodes. So given just this, the above scenario could work in > principle. But will it work in reality and will it be supported by IBM? > > Many thanks in advance. > > Best Regards, > > Stephan Peinkofer > > -- > Stephan Peinkofer > Leibniz Supercomputing Centre > Data and Storage Division > Boltzmannstra?e 1, 85748 Garching b.?M?nchen > URL: http://www.lrz.de > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From valdis.kletnieks at vt.edu Thu Jul 19 22:25:23 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 19 Jul 2018 17:25:23 -0400 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? Message-ID: <25435.1532035523@turing-police.cc.vt.edu> So I'm trying to tidy up things like 'mmhealth' etc. Got most of it fixed, but stuck on one thing.. Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which cleaned out a bunch of other long-past events that were "stuck" as failed / degraded even though they were corrected days/weeks ago - keep this in mind as you read on.... # mmhealth cluster show Component Total Failed Degraded Healthy Other ------------------------------------------------------------------------------------- NODE 10 0 0 10 0 GPFS 10 0 0 10 0 NETWORK 10 0 0 10 0 FILESYSTEM 1 0 1 0 0 DISK 102 0 0 102 0 CES 4 0 0 4 0 GUI 1 0 0 1 0 PERFMON 10 0 0 10 0 THRESHOLD 10 0 0 10 0 Great. One hit for 'degraded' filesystem. # mmhealth node show --unhealthy -N all (skipping all the nodes that show healthy) Node name: arnsd3-vtc.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ----------------------------------------------------------------------------------- FILESYSTEM FAILED 24 days ago pool-data_high_error(archive/system) (...) Node name: arproto2-isb.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ---------------------------------------------------------------------------------- FILESYSTEM DEGRADED 6 days ago pool-data_high_warn(archive/system) mmdf tells me: nsd_isb_01 13103005696 1 No Yes 1747905536 ( 13%) 111667200 ( 1%) nsd_isb_02 13103005696 1 No Yes 1748245504 ( 13%) 111724384 ( 1%) (94 more LUNs all within 0.2% of these for usage - data is striped out pretty well) There's also 6 SSD LUNs for metadata: nsd_isb_flash_01 2956984320 1 Yes No 2116091904 ( 72%) 26996992 ( 1%) (again, evenly striped) So who is remembering that status, and how to clear it? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jul 19 23:23:06 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 19 Jul 2018 22:23:06 +0000 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? In-Reply-To: <25435.1532035523@turing-police.cc.vt.edu> References: <25435.1532035523@turing-police.cc.vt.edu> Message-ID: <2165FB72-BF80-4EE4-908F-0399620C83D6@vanderbilt.edu> Hi Valdis, Is this what you?re looking for (from an IBMer in response to another question a few weeks back)? assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings: # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 # mmhealth thresholds delete MetaDataCapUtil_Rule The rule(s) was(were) deleted successfully # mmhealth thresholds add MetaDataPool_capUtil --errorlevel 95.0 --warnlevel 85.0 --direction high --sensitivity 300 --name MetaDataCapUtil_Rule --groupby gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 95.0 85.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 19, 2018, at 4:25 PM, valdis.kletnieks at vt.edu wrote: So I'm trying to tidy up things like 'mmhealth' etc. Got most of it fixed, but stuck on one thing.. Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which cleaned out a bunch of other long-past events that were "stuck" as failed / degraded even though they were corrected days/weeks ago - keep this in mind as you read on.... # mmhealth cluster show Component Total Failed Degraded Healthy Other ------------------------------------------------------------------------------------- NODE 10 0 0 10 0 GPFS 10 0 0 10 0 NETWORK 10 0 0 10 0 FILESYSTEM 1 0 1 0 0 DISK 102 0 0 102 0 CES 4 0 0 4 0 GUI 1 0 0 1 0 PERFMON 10 0 0 10 0 THRESHOLD 10 0 0 10 0 Great. One hit for 'degraded' filesystem. # mmhealth node show --unhealthy -N all (skipping all the nodes that show healthy) Node name: arnsd3-vtc.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ----------------------------------------------------------------------------------- FILESYSTEM FAILED 24 days ago pool-data_high_error(archive/system) (...) Node name: arproto2-isb.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ---------------------------------------------------------------------------------- FILESYSTEM DEGRADED 6 days ago pool-data_high_warn(archive/system) mmdf tells me: nsd_isb_01 13103005696 1 No Yes 1747905536 ( 13%) 111667200 ( 1%) nsd_isb_02 13103005696 1 No Yes 1748245504 ( 13%) 111724384 ( 1%) (94 more LUNs all within 0.2% of these for usage - data is striped out pretty well) There's also 6 SSD LUNs for metadata: nsd_isb_flash_01 2956984320 1 Yes No 2116091904 ( 72%) 26996992 ( 1%) (again, evenly striped) So who is remembering that status, and how to clear it? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ca2e808fa12e74ed277bc08d5edc51bc3%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636676353194563950&sdata=5biJuM0K0XwEw3BMwbS5epNQhrlig%2FFON7k1V79G%2Fyc%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Peinkofer at lrz.de Fri Jul 20 07:39:24 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 20 Jul 2018 06:39:24 +0000 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster In-Reply-To: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> References: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> Message-ID: <05cf5689138043da8321b728f320834c@lrz.de> Dear Simon and List, thanks. That was exactly I was looking for. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: Thursday, July 19, 2018 5:42 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster I think what you want is to use fabric numbers with verbsPorts, e.g. we have two IB fabrics and in the config we do thinks like: [nodeclass1] verbsPorts mlx4_0/1/1 [nodeclass2] verbsPorts mlx5_0/1/3 GPFS recognises the /1 or /3 at the end as a fabric number and knows they are separate and will Ethernet between those nodes instead. Simon From: on behalf of "Stephan.Peinkofer at lrz.de" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 19 July 2018 at 15:13 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster Dear GPFS List, does anyone of you know, if it is possible to have multiple file systems in a GPFS Cluster that all are served primary via Ethernet but for which different ?booster? connections to various IB/OPA fabrics exist. For example let?s say in my central Storage/NSD Cluster, I implement two file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is served by NSD-C and NSD-D. Now I have two client Clusters C1 and C2 which have different OPA fabrics. Both Clusters can mount the two file systems via Ethernet, but I now add OPA connections for NSD-A and NSD-B to C1?s fabric and OPA connections for NSD-C and NSD-D to C2?s fabric and just switch on RDMA. As far as I understood, GPFS will use RDMA if it is available between two nodes but switch to Ethernet if RDMA is not available between the two nodes. So given just this, the above scenario could work in principle. But will it work in reality and will it be supported by IBM? Many thanks in advance. Best Regards, Stephan Peinkofer -- Stephan Peinkofer Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen URL: http://www.lrz.de LRZ: Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften www.lrz.de Das LRZ ist das Rechenzentrum f?r die M?nchner Universit?ten, die Bayerische Akademie der Wissenschaften sowie nationales Zentrum f?r Hochleistungsrechnen. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Jul 20 09:29:29 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 20 Jul 2018 16:29:29 +0800 Subject: [gpfsug-discuss] mmfsadddisk command interrupted In-Reply-To: References: Message-ID: Hi Damir, Since many GPFS management command got unresponsive and you are running ESS, mail-list maybe not a good way to track this kinds of issue. Could you please raise a ticket to ESS/SpectrumScale to get help from IBM Service team? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Damir Krstic To: gpfsug main discussion list Date: 06/23/2018 03:04 AM Subject: [gpfsug-discuss] mmfsadddisk command interrupted Sent by: gpfsug-discuss-bounces at spectrumscale.org We were adding disks to one of our larger filesystems today. During the "checking allocation map for storage pool system" we had to interrupt the command since it was causing slow downs on our filesystem. Now commands like mmrepquota, mmdf, etc. are timing out with tsaddisk command is running message. Also during the run of the mmdf, mmrepquota, etc. filesystem becomes completely unresponsive. This command was run on ESS running version 5.2.0. Any help is much appreciated. Thank you. Damir_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From YARD at il.ibm.com Sat Jul 21 21:22:47 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Sat, 21 Jul 2018 23:22:47 +0300 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: Hi Do u run mmbackup on snapshot , which is read only ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From p.childs at qmul.ac.uk Sun Jul 22 12:26:35 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Sun, 22 Jul 2018 11:26:35 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, Message-ID: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_0C9372140C936C60006FF189C22582D1] Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_0C9306EC0C92FECC006FF189C22582D1][cid:_1_0C9308F40C92FECC006FF189C22582D1] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.gif Type: image/gif Size: 1851 bytes Desc: ATT00001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00002.gif Type: image/gif Size: 4376 bytes Desc: ATT00002.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00003.gif Type: image/gif Size: 5093 bytes Desc: ATT00003.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00004.gif Type: image/gif Size: 4746 bytes Desc: ATT00004.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00005.gif Type: image/gif Size: 4557 bytes Desc: ATT00005.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00006.gif Type: image/gif Size: 5093 bytes Desc: ATT00006.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00007.jpg Type: image/jpeg Size: 11294 bytes Desc: ATT00007.jpg URL: From jose.filipe.higino at gmail.com Sun Jul 22 13:51:03 2018 From: jose.filipe.higino at gmail.com (=?UTF-8?Q?Jos=C3=A9_Filipe_Higino?=) Date: Mon, 23 Jul 2018 00:51:03 +1200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs wrote: > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many > hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs > accessing the same file from many nodes also look to have come to an end > for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to > prioritise certain nodes over others but had not worked out if that was > possible yet. > > We've only previously seen this problem when we had some bad disks in our > storage, which we replaced, I've checked and I can't see that issue > currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Storage Architect ? IL Lab Services (Storage)* Petach Tiqva, 49527 > *IBM Global Markets, Systems HW Sales* Israel > > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > Mobile: +972-52-8395593 > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > [image: IBM Storage Strategy and Solutions v1][image: IBM Storage > Management and Data Protection v1] [image: > https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] > [image: Related image] > > > > From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / > processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00002.gif Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00003.gif Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00004.gif Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00005.gif Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00006.gif Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00007.jpg Type: image/jpeg Size: 11294 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00002.gif Type: image/gif Size: 4376 bytes Desc: not available URL: From scale at us.ibm.com Mon Jul 23 04:06:33 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 23 Jul 2018 11:06:33 +0800 Subject: [gpfsug-discuss] -o syncnfs has no effect? In-Reply-To: References: Message-ID: Hi, mmchfs Device -o syncnfs is the correct way of setting the syncnfs so that it applies to the file system both on the home and the remote cluster On 4.2.3+ syncnfs is the default option on Linux . Which means GPFS will implement the syncnfs behavior regardless of what the mount command says The documentation indicates that mmmount Device -o syncnfs=yes appears to be the correct syntax. When I tried that, I do see 'syncnfs=yes' in the output of the 'mount' command To change the remote mount option so that you don't have to specify the option on the command line every time you do mmmount, instead of using mmchfs, one should use mmremotefs update -o. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Billich Heinrich Rainer (PSI)" To: gpfsug main discussion list Date: 07/06/2018 12:06 AM Subject: [gpfsug-discuss] -o syncnfs has no effect? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, I try to mount a fs with "-o syncnfs" as we'll export it with CES/Protocols. But I never see the mount option displayed when I do # mount | grep fs-name This is a remote cluster mount, we'll run the Protocol nodes in a separate cluster. On the home cluster I see the option 'nfssync' in the output of 'mount'. My conclusion is that the mount option "syncnfs" has no effect on remote cluster mounts. Which seems a bit strange? Please can someone clarify on this? What is the impact on protocol nodes exporting remote cluster mounts? Is there any chance of data corruption? Or are some mount options implicitely inherited from the home cluster? I've read 'syncnfs' is default on Linux, but I would like to know for sure. Funny enough I can pass arbitrary options with # mmmount -o some-garbage which are silently ignored. I did 'mmchfs -o syncnfs' on the home cluster and the syncnfs option is present in /etc/fstab on the remote cluster. I did not remount on all nodes __ Thank you, I'll appreciate any hints or replies. Heiner Versions: Remote cluster 5.0.1 on RHEL7.4 (imounts the fs and runs protocol nodes) Home cluster 4.2.3-8 on RHEL6 (export the fs, owns the storage) Filesystem: 17.00 (4.2.3.0) All Linux x86_64 with Spectrum Scale Standard Edition -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scale at us.ibm.com Mon Jul 23 07:51:54 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 23 Jul 2018 14:51:54 +0800 Subject: [gpfsug-discuss] mmdiag --iohist question In-Reply-To: <351F676D-D785-4895-A278-3BEA717B9C87@vanderbilt.edu> References: <351F676D-D785-4895-A278-3BEA717B9C87@vanderbilt.edu> Message-ID: Hi Please check the IO type before examining the IP address for the output of mmdiag --iohist. For the "lcl"(local) IO, the IP address is not necessary and we don't show it. Please check whether this is your case. === mmdiag: iohist === I/O history: I/O start time RW Buf type disk:sectorNum nSec time ms Type Device/NSD ID NSD node --------------- -- ----------- ----------------- ----- ------- ---- ------------------ --------------- 01:14:08.450177 R inode 6:189513568 8 4.920 srv dm-4 192.168.116.92 01:14:08.450448 R inode 6:189513664 8 4.968 srv dm-4 192.168.116.92 01:14:08.475689 R inode 6:189428264 8 0.230 srv dm-4 192.168.116.92 01:14:08.983587 W logData 4:30686784 8 0.216 lcl dm-0 01:14:08.983601 W logData 3:25468480 8 0.197 lcl dm-8 01:14:08.983961 W inode 2:188808504 8 0.142 lcl dm-11 01:14:08.984144 W inode 1:188808504 8 0.134 lcl dm-7 Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/11/2018 10:34 PM Subject: [gpfsug-discuss] mmdiag --iohist question Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Quick question about ?mmdiag ?iohist? that is not documented in the man page ? what does it mean if the client IP address field is blank? That the NSD server itself issued the I/O? Or ??? This only happens occasionally ? and the way I discovered it was that our Python script that takes ?mmdiag ?iohist? output, looks up the client IP for any waits above the threshold, converts that to a hostname, and queries SLURM for whose jobs are on that client started occasionally throwing an exception ? and when I started looking at the ?mmdiag ?iohist? output itself I do see times when there is no client IP address listed for a I/O wait. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From p.childs at qmul.ac.uk Mon Jul 23 09:37:41 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 23 Jul 2018 08:37:41 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs > wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_0C9372140C936C60006FF189C22582D1] Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_0C9306EC0C92FECC006FF189C22582D1][cid:_1_0C9308F40C92FECC006FF189C22582D1] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" > Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From jose.filipe.higino at gmail.com Mon Jul 23 11:13:56 2018 From: jose.filipe.higino at gmail.com (=?UTF-8?Q?Jos=C3=A9_Filipe_Higino?=) Date: Mon, 23 Jul 2018 22:13:56 +1200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> Message-ID: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? How many quorum nodes? How many filesystems? Is the management network the same as the daemon network? On Mon, 23 Jul 2018 at 20:37, Peter Childs wrote: > On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: > > > Hi there, > > Have you been able to create a test case (replicate the problem)? Can you > tell us a bit more about the setup? > > > Not really, It feels like a perfect storm, any one of the tasks running on > its own would be fine, Its the shear load, our mmpmon data says the storage > has been flat lining when it occurs. > > Its a reasonably standard (small) HPC cluster, with a very mixed work > load, hence while we can usually find "bad" jobs from the point of view of > io on this occasion we can see a few large array jobs all accessing the > same file, the cluster runs fine until we get to a certain point and one > more will tip the balance. We've been attempting to limit the problem by > adding limits to the number of jobs in an array that can run at once. But > that feels like fire fighting. > > > Are you using GPFS API over any administrative commands? Any problems with > the network (being that Ethernet or IB)? > > > We're not as using the GPFS API, never got it working, which is a shame, > I've never managed to figure out the setup, although it is on my to do list. > > Network wise, We've just removed a great deal of noise from arp requests > by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit > network currently, we're currently looking at removing all the 1GBit nodes > within the next few months and adding some new faster kit. The Storage is > attached at 40GBit but it does not look to want to run much above 5Gbit I > suspect due to Ethernet back off due to the mixed speeds. > > While we do have some IB we don't currently run our storage over it. > > Thanks in advance > > Peter Childs > > > > > > Sorry if I am un-announced here for the first time. But I would like to > help if I can. > > Jose Higino, > from NIWA > New Zealand > > Cheers > > On Sun, 22 Jul 2018 at 23:26, Peter Childs wrote: > > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many > hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs > accessing the same file from many nodes also look to have come to an end > for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to > prioritise certain nodes over others but had not worked out if that was > possible yet. > > We've only previously seen this problem when we had some bad disks in our > storage, which we replaced, I've checked and I can't see that issue > currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Storage Architect ? IL Lab Services (Storage)* Petach Tiqva, 49527 > *IBM Global Markets, Systems HW Sales* Israel > > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > Mobile: +972-52-8395593 > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > [image: IBM Storage Strategy and Solutions v1][image: IBM Storage > Management and Data Protection v1] [image: > https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] > [image: Related image] > > > > From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / > processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jul 23 12:06:20 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 23 Jul 2018 11:06:20 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> Message-ID: On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? 4.2.3-8 How many quorum nodes? 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. How many filesystems? 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) Is the management network the same as the daemon network? Yes. the management network and the daemon network are the same network. Thanks in advance Peter Childs On Mon, 23 Jul 2018 at 20:37, Peter Childs > wrote: On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs > wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [X] Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][X][X] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" > Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From jose.filipe.higino at gmail.com Mon Jul 23 12:59:22 2018 From: jose.filipe.higino at gmail.com (=?UTF-8?Q?Jos=C3=A9_Filipe_Higino?=) Date: Mon, 23 Jul 2018 23:59:22 +1200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> Message-ID: Are the tiebreaker disks part of the same storage that is being used to provide disks for the NSDs of your filesystem? Having both management and daemon networks on the same network can impact the cluster in many ways. Depending on the requirements and workload conditions to run the cluster. Especially if the network is not 100% top notch or can be affected by external factors (other types of utilization). I would recur to a recent (and/or run a new one) performance benchmark result (IOR and MDTEST) and try to understand if the recordings of the current performance while observing the problem really tell something new. If not (if benchmarks tell that you are at the edge of the performance, then the best would be to consider increasing cluster performance) with additional disk hardware and/or network performance. If possible I would also recommend upgrading to the new Spectrum Scale 5 that have many new performance features. On Mon, 23 Jul 2018 at 23:06, Peter Childs wrote: > On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: > > I think the network problems need to be cleared first. Then I would > investigate further. > > Buf if that is not a trivial path... > Are you able to understand from the mmfslog what happens when the tipping > point occurs? > > > mmfslog thats not a term I've come accross before, if you mean > /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In > other words no expulsions or errors just a very slow filesystem, We've not > seen any significantly long waiters either (mmdiag --waiters) so as far as > I can see its just behaving like a very very busy filesystem. > > We've already had IBM looking at the snaps due to the rather slow mmbackup > process, all I've had back is to try increase -a ie the number of sort > threads which has speed it up to a certain extent, But once again I think > we're looking at the results of the issue not the cause. > > > In my view, when troubleshooting is not easy, the usual methods work/help > to find the next step: > - Narrow the window of troubleshooting (by discarding "for now" events > that did not happen within the same timeframe) > - Use "as precise" as possible, timebased events to read the reaction of > the cluster (via log or others) and make assumptions about other observed > situations. > - If possible and when the problem is happening, run some traces, > gpfs.snap and ask for support via PMR. > > Also, > > What is version of GPFS? > > > 4.2.3-8 > > How many quorum nodes? > > > 4 Quorum nodes with tie breaker disks, however these are not the file > system manager nodes as to fix a previous problem (with our nsd servers not > being powerful enough) our fsmanager nodes are on hardware, We have two > file system manager nodes (Which do token management, quota management etc) > they also run the mmbackup. > > How many filesystems? > > > 1, although we do have a second that is accessed via multi-cluster from > our older GPFS setup, (thats running 4.2.3-6 currently) > > Is the management network the same as the daemon network? > > > Yes. the management network and the daemon network are the same network. > > Thanks in advance > > Peter Childs > > > > On Mon, 23 Jul 2018 at 20:37, Peter Childs wrote: > > On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: > > > Hi there, > > Have you been able to create a test case (replicate the problem)? Can you > tell us a bit more about the setup? > > > Not really, It feels like a perfect storm, any one of the tasks running on > its own would be fine, Its the shear load, our mmpmon data says the storage > has been flat lining when it occurs. > > Its a reasonably standard (small) HPC cluster, with a very mixed work > load, hence while we can usually find "bad" jobs from the point of view of > io on this occasion we can see a few large array jobs all accessing the > same file, the cluster runs fine until we get to a certain point and one > more will tip the balance. We've been attempting to limit the problem by > adding limits to the number of jobs in an array that can run at once. But > that feels like fire fighting. > > > Are you using GPFS API over any administrative commands? Any problems with > the network (being that Ethernet or IB)? > > > We're not as using the GPFS API, never got it working, which is a shame, > I've never managed to figure out the setup, although it is on my to do list. > > Network wise, We've just removed a great deal of noise from arp requests > by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit > network currently, we're currently looking at removing all the 1GBit nodes > within the next few months and adding some new faster kit. The Storage is > attached at 40GBit but it does not look to want to run much above 5Gbit I > suspect due to Ethernet back off due to the mixed speeds. > > While we do have some IB we don't currently run our storage over it. > > Thanks in advance > > Peter Childs > > > > > > Sorry if I am un-announced here for the first time. But I would like to > help if I can. > > Jose Higino, > from NIWA > New Zealand > > Cheers > > On Sun, 22 Jul 2018 at 23:26, Peter Childs wrote: > > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many > hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs > accessing the same file from many nodes also look to have come to an end > for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to > prioritise certain nodes over others but had not worked out if that was > possible yet. > > We've only previously seen this problem when we had some bad disks in our > storage, which we replaced, I've checked and I can't see that issue > currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Storage Architect ? IL Lab Services (Storage)* Petach Tiqva, 49527 > *IBM Global Markets, Systems HW Sales* Israel > > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > Mobile: +972-52-8395593 > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > [image: IBM Storage Strategy and Solutions v1][image: IBM Storage > Management and Data Protection v1] [image: > https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] > [image: Related image] > > > > From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / > processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Jul 23 13:06:22 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 23 Jul 2018 08:06:22 -0400 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk><51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> Message-ID: Have you considered keeping the 1G network for daemon traffic and moving the data traffic to another network? Given the description of your configuration with only 2 manager nodes handling mmbackup and other tasks my guess is that is where the problem lies regarding performance when mmbackup is running with the many nodes accessing a single file. You said the fs managers were on hardware, does that mean other nodes in this cluster are VMs of some kind? You stated that your NSD servers were under powered. Did you address that problem in any way, that is adding memory/CPUs, or did you just move other GPFS activity off of those nodes? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/23/2018 07:06 AM Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? 4.2.3-8 How many quorum nodes? 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. How many filesystems? 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) Is the management network the same as the daemon network? Yes. the management network and the daemon network are the same network. Thanks in advance Peter Childs On Mon, 23 Jul 2018 at 20:37, Peter Childs wrote: On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Mon Jul 23 19:12:25 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 23 Jul 2018 14:12:25 -0400 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? In-Reply-To: <2165FB72-BF80-4EE4-908F-0399620C83D6@vanderbilt.edu> References: <25435.1532035523@turing-police.cc.vt.edu> <2165FB72-BF80-4EE4-908F-0399620C83D6@vanderbilt.edu> Message-ID: <22017.1532369545@turing-police.cc.vt.edu> On Thu, 19 Jul 2018 22:23:06 -0000, "Buterbaugh, Kevin L" said: > Is this what you???re looking for (from an IBMer in response to another question a few weeks back)? > > assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings: Nope, that bring zero joy (though it did give me a chance to set a more appropriate set of thresholds for our environment. And I'm still perplexed as to *where* those events are stored - what's remembering it after a 'mmhealth eventlog --clear -N all'? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 23 21:05:05 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 23 Jul 2018 20:05:05 +0000 Subject: [gpfsug-discuss] mmdiag --iohist question In-Reply-To: References: <351F676D-D785-4895-A278-3BEA717B9C87@vanderbilt.edu> Message-ID: Hi GPFS team, Yes, that?s what we see, too ? thanks. Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 23, 2018, at 1:51 AM, IBM Spectrum Scale > wrote: Hi Please check the IO type before examining the IP address for the output of mmdiag --iohist. For the "lcl"(local) IO, the IP address is not necessary and we don't show it. Please check whether this is your case. === mmdiag: iohist === I/O history: I/O start time RW Buf type disk:sectorNum nSec time ms Type Device/NSD ID NSD node --------------- -- ----------- ----------------- ----- ------- ---- ------------------ --------------- 01:14:08.450177 R inode 6:189513568 8 4.920 srv dm-4 192.168.116.92 01:14:08.450448 R inode 6:189513664 8 4.968 srv dm-4 192.168.116.92 01:14:08.475689 R inode 6:189428264 8 0.230 srv dm-4 192.168.116.92 01:14:08.983587 W logData 4:30686784 8 0.216 lcl dm-0 01:14:08.983601 W logData 3:25468480 8 0.197 lcl dm-8 01:14:08.983961 W inode 2:188808504 8 0.142 lcl dm-11 01:14:08.984144 W inode 1:188808504 8 0.134 lcl dm-7 Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Buterbaugh, Kevin L" ---07/11/2018 10:34:32 PM---Hi All, Quick question about ?mmdiag ?iohist? that is not documented in the man page ? what does it From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 07/11/2018 10:34 PM Subject: [gpfsug-discuss] mmdiag --iohist question Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Quick question about ?mmdiag ?iohist? that is not documented in the man page ? what does it mean if the client IP address field is blank? That the NSD server itself issued the I/O? Or ??? This only happens occasionally ? and the way I discovered it was that our Python script that takes ?mmdiag ?iohist? output, looks up the client IP for any waits above the threshold, converts that to a hostname, and queries SLURM for whose jobs are on that client started occasionally throwing an exception ? and when I started looking at the ?mmdiag ?iohist? output itself I do see times when there is no client IP address listed for a I/O wait. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbc6d7df8b9fb453b50bf08d5f068cc1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636679255264001433&sdata=uSiXYheeOw%2F4%2BSls8lP3XO9w7i7dFc3UWEYa%2F8aIn%2B0%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jul 23 21:06:14 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 23 Jul 2018 20:06:14 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk><51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> , Message-ID: ---- Frederick Stock wrote ---- > Have you considered keeping the 1G network for daemon traffic and moving the data traffic to another network? Considered, but never really understood the logic or value of building a second network, nor seen a good argument for the additional cost and work setting it up. While I've heard it lots of times, that the network is key to good gpfs performance. I've actually always found that it can be lots of other things too and your usally best keeping and open view and checking everything. This issue disappeared on Friday when the file system manager locked up entirely, and we failed it over to the other one and restarted gpfs. It's been fine all weekend, and currently it's looking to be a failed gpfs daemon on the manager node that was causing all the bad io. If I'd know that I'd have restarted gpfs on that node earlier... > > Given the description of your configuration with only 2 manager nodes handling mmbackup and other tasks my guess is that is where the problem lies regarding performance when mmbackup is running with the many nodes accessing a single file. You said the fs managers were on hardware, does that mean other nodes in this cluster are VMs of some kind? > > You stated that your NSD servers were under powered. Did you address that problem in any way, that is adding memory/CPUs, or did you just move other GPFS activity off of those nodes? > Our nsd servers are virtual everything else on the cluster is real. It's a gridscaler gs7k. Hence why it's difficult to throw more power at the issue. We are looking at upgrading to 5.0.1, within the next few months as we're in the progress of adding a new ssd based scratch filesystem to the cluster. Hopefully this will help resolve some of our issues. Peter Childs. > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: Peter Childs > > To: "gpfsug-discuss at spectrumscale.org" > > Date: 07/23/2018 07:06 AM > Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: > I think the network problems need to be cleared first. Then I would investigate further. > > Buf if that is not a trivial path... > Are you able to understand from the mmfslog what happens when the tipping point occurs? > > mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. > > We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. > > > In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: > - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) > - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. > - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. > > Also, > > What is version of GPFS? > > 4.2.3-8 > > How many quorum nodes? > > 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. > > How many filesystems? > > 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) > > Is the management network the same as the daemon network? > > Yes. the management network and the daemon network are the same network. > > Thanks in advance > > Peter Childs > > > > On Mon, 23 Jul 2018 at 20:37, Peter Childs > wrote: > On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: > > Hi there, > > Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? > > Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. > > Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. > > > Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? > > We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. > > Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. > > While we do have some IB we don't currently run our storage over it. > > Thanks in advance > > Peter Childs > > > > > > Sorry if I am un-announced here for the first time. But I would like to help if I can. > > Jose Higino, > from NIWA > New Zealand > > Cheers > > On Sun, 22 Jul 2018 at 23:26, Peter Childs > wrote: > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. > > We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > > > Yaron Daniel 94 Em Ha'Moshavot Rd > > Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel > > > > > > > From: Peter Childs > > To: "gpfsug-discuss at spectrumscale.org" > > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss Have you considered keeping the 1G network for daemon traffic and moving the data traffic to another network? Given the description of your configuration with only 2 manager nodes handling mmbackup and other tasks my guess is that is where the problem lies regarding performance when mmbackup is running with the many nodes accessing a single file. You said the fs managers were on hardware, does that mean other nodes in this cluster are VMs of some kind? You stated that your NSD servers were under powered. Did you address that problem in any way, that is adding memory/CPUs, or did you just move other GPFS activity off of those nodes? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/23/2018 07:06 AM Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? 4.2.3-8 How many quorum nodes? 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. How many filesystems? 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) Is the management network the same as the daemon network? Yes. the management network and the daemon network are the same network. Thanks in advance Peter Childs On Mon, 23 Jul 2018 at 20:37, Peter Childs > wrote: On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs > wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" > Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From NSCHULD at de.ibm.com Tue Jul 24 08:45:03 2018 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Tue, 24 Jul 2018 09:45:03 +0200 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? In-Reply-To: <25435.1532035523@turing-police.cc.vt.edu> References: <25435.1532035523@turing-police.cc.vt.edu> Message-ID: Hi, that message is still in memory. "mmhealth node eventlog --clear" deletes all old events but those which are currently active are not affected. I think this is related to multiple Collector Nodes, will dig deeper into that code to find out if some issue lurks there. As a stop-gap measure one could execute "mmsysmoncontrol restart" on the affected node(s) as this stops the monitoring process and doing so clears the event in memory. The data used for the event comes from mmlspool (should be close or identical to mmdf) Mit freundlichen Gr??en / Kind regards Norbert Schuld From: valdis.kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Date: 20/07/2018 00:15 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? Sent by: gpfsug-discuss-bounces at spectrumscale.org So I'm trying to tidy up things like 'mmhealth' etc. Got most of it fixed, but stuck on one thing.. Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which cleaned out a bunch of other long-past events that were "stuck" as failed / degraded even though they were corrected days/weeks ago - keep this in mind as you read on.... # mmhealth cluster show Component Total Failed Degraded Healthy Other ------------------------------------------------------------------------------------- NODE 10 0 0 10 0 GPFS 10 0 0 10 0 NETWORK 10 0 0 10 0 FILESYSTEM 1 0 1 0 0 DISK 102 0 0 102 0 CES 4 0 0 4 0 GUI 1 0 0 1 0 PERFMON 10 0 0 10 0 THRESHOLD 10 0 0 10 0 Great. One hit for 'degraded' filesystem. # mmhealth node show --unhealthy -N all (skipping all the nodes that show healthy) Node name: arnsd3-vtc.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ----------------------------------------------------------------------------------- FILESYSTEM FAILED 24 days ago pool-data_high_error (archive/system) (...) Node name: arproto2-isb.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ---------------------------------------------------------------------------------- FILESYSTEM DEGRADED 6 days ago pool-data_high_warn (archive/system) mmdf tells me: nsd_isb_01 13103005696 1 No Yes 1747905536 ( 13%) 111667200 ( 1%) nsd_isb_02 13103005696 1 No Yes 1748245504 ( 13%) 111724384 ( 1%) (94 more LUNs all within 0.2% of these for usage - data is striped out pretty well) There's also 6 SSD LUNs for metadata: nsd_isb_flash_01 2956984320 1 Yes No 2116091904 ( 72%) 26996992 ( 1%) (again, evenly striped) So who is remembering that status, and how to clear it? [attachment "attccdgx.dat" deleted by Norbert Schuld/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From heiner.billich at psi.ch Tue Jul 24 14:43:52 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Tue, 24 Jul 2018 13:43:52 +0000 Subject: [gpfsug-discuss] control which hosts become token manager Message-ID: Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From bzhang at ca.ibm.com Tue Jul 24 16:03:54 2018 From: bzhang at ca.ibm.com (Bohai Zhang) Date: Tue, 24 Jul 2018 11:03:54 -0400 Subject: [gpfsug-discuss] IBM Elastic Storage Server (ESS) Support is going to host a client facing webinar In-Reply-To: References: <25435.1532035523@turing-police.cc.vt.edu> Message-ID: Hi all, IBM Elastic Storage Server support team is going to host a webinar to discuss Spectrum Scale (GPFS) encryption. Everyone is welcome. Please use the following links to register. Thanks, NA/EU Session Date: Aug 8, 2018 Time: 10 AM - 11 AM EDT (2 PM ? 3 PM GMT) Registration: https://ibm.biz/BdY4SE Audience: Scale/ESS administrators. AP/JP/India Session Date: Aug 9, 2018 Time: 10 AM - 11 AM Beijing Time (11 AM ? 12? AM Tokyo Time) Registration: https://ibm.biz/BdY4SH Audience: Scale/ESS administrators. Regards, IBM Spectrum Computing Bohai Zhang Critical Senior Technical Leader, IBM Systems Situation Tel: 1-905-316-2727 Resolver Mobile: 1-416-897-7488 Expert Badge Email: bzhang at ca.ibm.com 3600 STEELES AVE EAST, MARKHAM, ON, L3R 9Z7, Canada Live Chat at IBMStorageSuptMobile Apps Support Portal | Fix Central | Knowledge Center | Request for Enhancement | Product SMC IBM | dWA We meet our service commitment only when you are very satisfied and EXTREMELY LIKELY to recommend IBM. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C553518.jpg Type: image/jpeg Size: 124313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C974093.gif Type: image/gif Size: 2665 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C503228.gif Type: image/gif Size: 275 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C494180.gif Type: image/gif Size: 305 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C801702.gif Type: image/gif Size: 331 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C254205.gif Type: image/gif Size: 3621 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C585014.gif Type: image/gif Size: 1243 bytes Desc: not available URL: From p.childs at qmul.ac.uk Tue Jul 24 20:28:34 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 24 Jul 2018 19:28:34 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: Message-ID: What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. >From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Tue Jul 24 22:12:06 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 24 Jul 2018 21:12:06 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: Message-ID: <366795a1f7b34edc985d85124f787774@jumptrading.com> Agree with Peter here. And if the file system and workload are of significant size then isolating the token manager to a dedicated node is definitely best practice. Unfortunately there isn't a way to specify a preferred manager per FS... (Bryan starts typing up a new RFE...). Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peter Childs Sent: Tuesday, July 24, 2018 2:29 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. >From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don't want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that 'mmdiag -tokenmgr' lists the machine as active token manager. The machine has role 'quorum-client'. This doesn't seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Wed Jul 25 17:40:46 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 25 Jul 2018 16:40:46 +0000 Subject: [gpfsug-discuss] Brief survey question: Spectrum Scale downloads and protocols Message-ID: The Spectrum Scale team is considering a change to Scale's packaging, and we'd like to get input from as many of you as possible on the likely impact. Today, Scale is available to download in two images: With Protocols, and Without Protocols. We'd like to do away with this and in future just have one image, With Protocols. To be clear, installing Protocols will still be entirely optional -- it's only the download that will change. You can find the survey here: www.surveygizmo.com/s3/4476580/IBM-Spectrum-Scale-Packaging For those interested in a little more background... Why change this? Because making two images for every Edition for every release and patch is additional work, with added testing and more opportunities for mistakes to creep in. If it's not adding real value, we'd prefer not to keep doing it! Why do we need to ask first? Because we've been doing separate images for a long time, and there was a good reason why we started doing it. But it's not clear that the original reasons are still relevant. However, we don't want to make that assumption without asking first. Thanks in advance for your help, Carl Zetie Offering Manager for Spectrum Scale, IBM - (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From SAnderson at convergeone.com Wed Jul 25 19:57:03 2018 From: SAnderson at convergeone.com (Shaun Anderson) Date: Wed, 25 Jul 2018 18:57:03 +0000 Subject: [gpfsug-discuss] Compression details Message-ID: <1532545023753.65276@convergeone.com> I've had the question come up about how SS will handle file deletion as well as overhead required for compression using zl4. The two questions I'm looking for answers (or better yet, reference material documenting) to are: 1) - How is file deletion handled? Is the block containing the compressed file decompressed, the file deleted, and then recompressed? Or is metadata simply updated showing the file is to be deleted? Does Scale run an implicit 'mmchattr --compression no' command? 2) - Are there any guidelines on the overhead to plan for in a compressed environment (lz4)? I'm not seeing any kind of sizing guidance. This is potentially going to be for an exisitng ESS GL2 system. Any assistance or direction is appreciated. Regards, ? SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments hereto may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Jul 26 00:05:27 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 25 Jul 2018 23:05:27 +0000 Subject: [gpfsug-discuss] Compression details In-Reply-To: <1532545023753.65276@convergeone.com> References: <1532545023753.65276@convergeone.com> Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Jul 26 14:24:14 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 26 Jul 2018 08:24:14 -0500 Subject: [gpfsug-discuss] Compression details In-Reply-To: <1532545023753.65276@convergeone.com> References: <1532545023753.65276@convergeone.com> Message-ID: > 1) How is file deletion handled? This depends on whether there's snapshot and whether COW is needed. If COW is not needed or there's no snapshot at all, then the file deletion is handled as non-compressed file(don't decompress the data blocks and simply discard the data blocks, then delete the inode). However, even if COW is needed, then uncompression before COW is only needed when one of following conditions is true. 1) the block to be moved is not the first block of a compression group(10 blocks is compression group since block 0). 2) the compression group ends beyond the last block of destination file (file in latest snapshot). 3) the compression group is not full and the destination file is larger. 4) the compression group ends at the last block of destination file, but the size between source and destination files are different. 5) the destination file already has some allocated blocks(COWed) within the compression group. > 2) Are there any guidelines LZ4 compression algorithm is already made good trade-off between performance and compression ratio. So it really depends on your data characters and access patterns. For example: if the data is write-once but read-many times, then there shouldn't be too much overhead as only compressed one time(I suppose decompression with lz4 doesn't consume too much resource as compression). If your data is really randomized, then compressing with lz4 doesn't give back too much help on storage space save, but still need to compress data as well as decompression when needed. But note that compressed data could also reduce the overhead to storage and network because smaller I/O size would be done for compressed file, so from application overall point of view, the overhead could be not added at all.... Regards, The Spectrum Scale (GPFS) team -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre-marie.brunet at cnes.fr Fri Jul 27 01:06:44 2018 From: pierre-marie.brunet at cnes.fr (Brunet Pierre-Marie) Date: Fri, 27 Jul 2018 00:06:44 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Message-ID: Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... => random "Unknown error 521" on NFS clients. Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers crossed !) up to now, it seems to work properly. Is there any official recommendation from IBM on this problem ? Regards, PM -- HPC center French space agency -----Message d'origine----- De?: gpfsug-discuss-bounces at spectrumscale.org De la part de gpfsug-discuss-request at spectrumscale.org Envoy??: jeudi 14 juin 2018 13:00 ??: gpfsug-discuss at spectrumscale.org Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) ---------------------------------------------------------------------- Message: 1 Date: Wed, 13 Jun 2018 17:45:44 +0300 From: "Tomer Perry" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="iso-8859-1" Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek To: gpfsug main discussion list Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add > HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Wed, 13 Jun 2018 15:14:53 +0000 From: "Wilson, Neil" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="utf-8" We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 19 ********************************************** From scale at us.ibm.com Fri Jul 27 12:56:02 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 27 Jul 2018 06:56:02 -0500 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) In-Reply-To: References: Message-ID: errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. /* Defined for the NFSv3 protocol */ #define EBADHANDLE 521 /* Illegal NFS file handle */ Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Brunet Pierre-Marie To: "gpfsug-discuss at spectrumscale.org" Date: 07/26/2018 07:17 PM Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... => random "Unknown error 521" on NFS clients. Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers crossed !) up to now, it seems to work properly. Is there any official recommendation from IBM on this problem ? Regards, PM -- HPC center French space agency -----Message d'origine----- De?: gpfsug-discuss-bounces at spectrumscale.org De la part de gpfsug-discuss-request at spectrumscale.org Envoy??: jeudi 14 juin 2018 13:00 ??: gpfsug-discuss at spectrumscale.org Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) ---------------------------------------------------------------------- Message: 1 Date: Wed, 13 Jun 2018 17:45:44 +0300 From: "Tomer Perry" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="iso-8859-1" Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek To: gpfsug main discussion list Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add > HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180613/3cf6fbce/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 13 Jun 2018 15:14:53 +0000 From: "Wilson, Neil" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="utf-8" We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From xhejtman at ics.muni.cz Fri Jul 27 13:06:11 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 27 Jul 2018 14:06:11 +0200 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) In-Reply-To: References: Message-ID: <20180727120611.aunjlxht33vp7txf@ics.muni.cz> Hello, no it is not. It's a bug in GPFS vfs layer, efix has been already released. On Fri, Jul 27, 2018 at 06:56:02AM -0500, IBM Spectrum Scale wrote: > > errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. > > > /* Defined for the NFSv3 protocol */ > #define EBADHANDLE 521 /* Illegal NFS file handle */ > > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Brunet Pierre-Marie > To: "gpfsug-discuss at spectrumscale.org" > > Date: 07/26/2018 07:17 PM > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi, > > We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 > and RHEL 7.5 with 4 gateways servers executing Kernel NFS... > => random "Unknown error 521" on NFS clients. > > Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers > crossed !) up to now, it seems to work properly. > > Is there any official recommendation from IBM on this problem ? > > Regards, > PM > -- > HPC center > French space agency > > -----Message d'origine----- > De?: gpfsug-discuss-bounces at spectrumscale.org > De la part de > gpfsug-discuss-request at spectrumscale.org > Envoy??: jeudi 14 juin 2018 13:00 > ??: gpfsug-discuss at spectrumscale.org > Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific than > "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) > 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 13 Jun 2018 17:45:44 +0300 > From: "Tomer Perry" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Message-ID: > > > > > Content-Type: text/plain; charset="iso-8859-1" > > Please open a service ticket > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Lukas Hejtmanek > To: gpfsug main discussion list > Date: 13/06/2018 13:14 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > > knfs is supported - with or without the cNFS feature ( cNFS will add > > HA > > > to NFS on top of GPFS - > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > > > > ). > > > > knfs and cNFS can't coexist with CES in the same environment. > > well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180613/3cf6fbce/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Wed, 13 Jun 2018 15:14:53 +0000 > From: "Wilson, Neil" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Message-ID: > > > > > Content-Type: text/plain; charset="utf-8" > > We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version > 3.10.0-693.21.1.el7.x86_64 and are not having any errors. > So it's probably just GPFS not being ready for 7.5 yet. > > Neil. > > Neil Wilson? Senior IT Practitioner > Storage, Virtualisation and Mainframe Team?? IT Services > Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Jonathan Buzzard > Sent: 13 June 2018 10:33 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > > On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > > Hello, > > > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > > Not sure whether it is due to kernel or GPFS. > > > > GPFS being not supported on 7.5 at this time would be the starting point. I > am also under the impression that kernel NFS was not supported either it's > Ganesha or nothing. > > The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the > past that has worked for me. > > JAB. > > -- > Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System > Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > End of gpfsug-discuss Digest, Vol 77, Issue 19 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From neil.wilson at metoffice.gov.uk Fri Jul 27 13:26:28 2018 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Fri, 27 Jul 2018 12:26:28 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: We are still running 7.4 with 4.2.3-9 on our NSD nodes, cNFS nodes and client nodes. A rhel 7.5 client node build is being tested at the moment and will be deployed if testing is a success. However I don't think we will be upgrading the NSD nodes or cNFS nodes to 7.5 for a while. Regards Neil Neil Wilson Senior IT Practitioner Storage, Virtualisation and Mainframe Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of IBM Spectrum Scale Sent: 27 July 2018 12:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. /* Defined for the NFSv3 protocol */ #define EBADHANDLE 521 /* Illegal NFS file handle */ Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. [Inactive hide details for Brunet Pierre-Marie ---07/26/2018 07:17:25 PM---Hi, We are facing the same issue : we just upgrade o]Brunet Pierre-Marie ---07/26/2018 07:17:25 PM---Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 From: Brunet Pierre-Marie > To: "gpfsug-discuss at spectrumscale.org" > Date: 07/26/2018 07:17 PM Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... => random "Unknown error 521" on NFS clients. Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers crossed !) up to now, it seems to work properly. Is there any official recommendation from IBM on this problem ? Regards, PM -- HPC center French space agency -----Message d'origine----- De : gpfsug-discuss-bounces at spectrumscale.org > De la part de gpfsug-discuss-request at spectrumscale.org Envoy? : jeudi 14 juin 2018 13:00 ? : gpfsug-discuss at spectrumscale.org Objet : gpfsug-discuss Digest, Vol 77, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) ---------------------------------------------------------------------- Message: 1 Date: Wed, 13 Jun 2018 17:45:44 +0300 From: "Tomer Perry" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: > Content-Type: text/plain; charset="iso-8859-1" Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek > To: gpfsug main discussion list > Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add > HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Wed, 13 Jun 2018 15:14:53 +0000 From: "Wilson, Neil" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: > Content-Type: text/plain; charset="utf-8" We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From pierre-marie.brunet at cnes.fr Fri Jul 27 14:56:04 2018 From: pierre-marie.brunet at cnes.fr (Brunet Pierre-Marie) Date: Fri, 27 Jul 2018 13:56:04 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) (IBM Spectrum Scale) Message-ID: Hi Scale Team, I know but I can't reproduce the problem with a simple kernel NFS server on a RH7.5 with a local filesystem for instance. It seems to be linked somehow with GPFS 4.2.3-9... I don't know what is the behavior with previous release. But as I said, the downgrade to RHE7.4 has solved the problem... vicious bug for sure. Regards, PM -- HPC center French space agency -----Message d'origine----- De?: gpfsug-discuss-bounces at spectrumscale.org De la part de gpfsug-discuss-request at spectrumscale.org Envoy??: vendredi 27 juillet 2018 14:22 ??: gpfsug-discuss at spectrumscale.org Objet?: gpfsug-discuss Digest, Vol 78, Issue 68 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) (IBM Spectrum Scale) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) (Lukas Hejtmanek) ---------------------------------------------------------------------- Message: 1 Date: Fri, 27 Jul 2018 06:56:02 -0500 From: "IBM Spectrum Scale" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Message-ID: Content-Type: text/plain; charset="iso-8859-1" errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. /* Defined for the NFSv3 protocol */ #define EBADHANDLE 521 /* Illegal NFS file handle */ Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Brunet Pierre-Marie To: "gpfsug-discuss at spectrumscale.org" Date: 07/26/2018 07:17 PM Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... => random "Unknown error 521" on NFS clients. Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers crossed !) up to now, it seems to work properly. Is there any official recommendation from IBM on this problem ? Regards, PM -- HPC center French space agency -----Message d'origine----- De?: gpfsug-discuss-bounces at spectrumscale.org De la part de gpfsug-discuss-request at spectrumscale.org Envoy??: jeudi 14 juin 2018 13:00 ??: gpfsug-discuss at spectrumscale.org Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) ---------------------------------------------------------------------- Message: 1 Date: Wed, 13 Jun 2018 17:45:44 +0300 From: "Tomer Perry" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="iso-8859-1" Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek To: gpfsug main discussion list Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add > HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180613/3cf6fbce/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 13 Jun 2018 15:14:53 +0000 From: "Wilson, Neil" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="utf-8" We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ Message: 2 Date: Fri, 27 Jul 2018 14:06:11 +0200 From: Lukas Hejtmanek To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Message-ID: <20180727120611.aunjlxht33vp7txf at ics.muni.cz> Content-Type: text/plain; charset=utf8 Hello, no it is not. It's a bug in GPFS vfs layer, efix has been already released. On Fri, Jul 27, 2018 at 06:56:02AM -0500, IBM Spectrum Scale wrote: > > errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. > > > /* Defined for the NFSv3 protocol */ > #define EBADHANDLE 521 /* Illegal NFS file handle */ > > > Regards, The Spectrum Scale (GPFS) team > > ---------------------------------------------------------------------- > -------------------------------------------- > > If you feel that your question can benefit other users of Spectrum > Scale (GPFS), then please post it to the public IBM developerWroks > Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) and you have an IBM software maintenance contract please > contact > 1-800-237-5511 in the United States or your local IBM Service Center > in other countries. > > The forum is informally monitored as time permits and should not be > used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Brunet Pierre-Marie > To: "gpfsug-discuss at spectrumscale.org" > > Date: 07/26/2018 07:17 PM > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi, > > We are facing the same issue : we just upgrade our cluster to GPFS > 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... > => random "Unknown error 521" on NFS clients. > > Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers > crossed !) up to now, it seems to work properly. > > Is there any official recommendation from IBM on this problem ? > > Regards, > PM > -- > HPC center > French space agency > > -----Message d'origine----- > De?: gpfsug-discuss-bounces at spectrumscale.org > De la part de > gpfsug-discuss-request at spectrumscale.org > Envoy??: jeudi 14 juin 2018 13:00 > ??: gpfsug-discuss at spectrumscale.org > Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than > "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) > 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 13 Jun 2018 17:45:44 +0300 > From: "Tomer Perry" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Message-ID: > > llabserv.com> > > > Content-Type: text/plain; charset="iso-8859-1" > > Please open a service ticket > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Lukas Hejtmanek > To: gpfsug main discussion list > Date: 13/06/2018 13:14 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > > knfs is supported - with or without the cNFS feature ( cNFS will > > add HA > > > to NFS on top of GPFS - > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.sp > ectrum.scale.v5r01.doc/bl1adv_cnfs.htm > > > > ). > > > > knfs and cNFS can't coexist with CES in the same environment. > > well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > -------------- next part -------------- An HTML attachment was > scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180613/3cf6fb > ce/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Wed, 13 Jun 2018 15:14:53 +0000 > From: "Wilson, Neil" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Message-ID: > > changelabs.com> > > > Content-Type: text/plain; charset="utf-8" > > We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version > 3.10.0-693.21.1.el7.x86_64 and are not having any errors. > So it's probably just GPFS not being ready for 7.5 yet. > > Neil. > > Neil Wilson? Senior IT Practitioner > Storage, Virtualisation and Mainframe Team?? IT Services Met Office > FitzRoy Road Exeter Devon EX1 3PB United Kingdom > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Jonathan > Buzzard > Sent: 13 June 2018 10:33 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > > On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > > Hello, > > > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > > Not sure whether it is due to kernel or GPFS. > > > > GPFS being not supported on 7.5 at this time would be the starting > point. I am also under the impression that kernel NFS was not > supported either it's Ganesha or nothing. > > The interim fix is probably to downgrade to a 7.4 kernel. Certainly in > the past that has worked for me. > > JAB. > > -- > Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC > System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > End of gpfsug-discuss Digest, Vol 77, Issue 19 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 78, Issue 68 ********************************************** From scale at us.ibm.com Fri Jul 27 15:43:16 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 27 Jul 2018 09:43:16 -0500 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) In-Reply-To: <20180727120611.aunjlxht33vp7txf@ics.muni.cz> References: <20180727120611.aunjlxht33vp7txf@ics.muni.cz> Message-ID: There is a fix in 4.2.3.9 efix3 that corrects a condition where GPFS was failing a revalidate call and that was causing kNFS to generate EBADHANDLE. Without more information on your case (traces), I cannot say for sure that this will resolve your issue, but it is available for you to try. Regards, The Spectrum Scale (GPFS) team -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 27 16:18:50 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 27 Jul 2018 15:18:50 +0000 Subject: [gpfsug-discuss] Power9 / GPFS Message-ID: I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 27 16:30:42 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 27 Jul 2018 15:30:42 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: References: Message-ID: <13B6CE9B-CF93-43BB-A120-136CCC3AC7BC@vanderbilt.edu> Hi Simon, Have you tried running it with the ??silent? flag, too? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 27, 2018, at 10:18 AM, Simon Thompson > wrote: I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9660d98faa7b4241b52508d5f3d44462%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636683015365941338&sdata=8%2BKtcv8Tm3S5OS67xX5lOZatL%2B7mHZ71HXgm6dalEmg%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Fri Jul 27 16:32:55 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Fri, 27 Jul 2018 15:32:55 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: <366795a1f7b34edc985d85124f787774@jumptrading.com> References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: Thank you, The cluster was freshly set up and the VM node never was denoted as manager, it was created as quorum-client. What I didn?t mention but probably should have: This is a multicluster mount, the cluster has no own storage. Hence the filesystem manager are on the home cluster, according to mmlsmgr. Hm, probably more complicated as I initially thought. Still I would expect that for file-access that is restricted to this cluster all token management is handled inside the cluster, too? And I don?t want the weakest node to participate. Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of Bryan Banister Reply-To: gpfsug main discussion list Date: Tuesday 24 July 2018 at 23:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Agree with Peter here. And if the file system and workload are of significant size then isolating the token manager to a dedicated node is definitely best practice. Unfortunately there isn?t a way to specify a preferred manager per FS? (Bryan starts typing up a new RFE?). Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peter Childs Sent: Tuesday, July 24, 2018 2:29 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Fri Jul 27 16:40:11 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Fri, 27 Jul 2018 15:40:11 +0000 Subject: [gpfsug-discuss] Power9 / GPFS Message-ID: <6756DCC6-7366-4F8C-8D61-F38D0241CDB4@psi.ch> Hello If you don?t need the installer maybe just extract the RPMs, this bypasses java. For x86_64 I use commands like the once below, shouldn?t be much different on power. TARFILE=$1 START=$( grep -a -m 1 ^PGM_BEGIN_TGZ= $TARFILE| cut -d= -f2) echo extract RPMs from $TARFILE with START=$START tail -n +$START $TARFILE | tar xvzf - *.rpm */repodata/* Kind regards, Heiner -- Paul Scherrer Institut From: on behalf of Simon Thompson Reply-To: gpfsug main discussion list Date: Friday 27 July 2018 at 17:19 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Power9 / GPFS I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 27 16:41:39 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 27 Jul 2018 15:41:39 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: <13B6CE9B-CF93-43BB-A120-136CCC3AC7BC@vanderbilt.edu> References: <13B6CE9B-CF93-43BB-A120-136CCC3AC7BC@vanderbilt.edu> Message-ID: Yeah does the same ? The system java seems to do it is well ? maybe its just broken ? Simon From: on behalf of "Buterbaugh, Kevin L" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 16:32 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS Hi Simon, Have you tried running it with the ??silent? flag, too? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 27, 2018, at 10:18 AM, Simon Thompson > wrote: I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9660d98faa7b4241b52508d5f3d44462%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636683015365941338&sdata=8%2BKtcv8Tm3S5OS67xX5lOZatL%2B7mHZ71HXgm6dalEmg%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Fri Jul 27 16:35:14 2018 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 27 Jul 2018 15:35:14 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: References: Message-ID: License acceptance notwithstanding, the RPM extraction should at least be achievable with? tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: Friday, July 27, 2018 11:19 AM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Power9 / GPFS I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 27 16:54:16 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 27 Jul 2018 15:54:16 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: <6756DCC6-7366-4F8C-8D61-F38D0241CDB4@psi.ch> References: <6756DCC6-7366-4F8C-8D61-F38D0241CDB4@psi.ch> Message-ID: <986024E4-512D-45A0-A859-EBED468B07A3@bham.ac.uk> Thanks, (and also Paul with a very similar comment)? I now have my packages unpacked ? and hey, who needs java anyway ? Simon From: on behalf of "heiner.billich at psi.ch" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 16:40 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS Hello If you don?t need the installer maybe just extract the RPMs, this bypasses java. For x86_64 I use commands like the once below, shouldn?t be much different on power. TARFILE=$1 START=$( grep -a -m 1 ^PGM_BEGIN_TGZ= $TARFILE| cut -d= -f2) echo extract RPMs from $TARFILE with START=$START tail -n +$START $TARFILE | tar xvzf - *.rpm */repodata/* Kind regards, Heiner -- Paul Scherrer Institut From: on behalf of Simon Thompson Reply-To: gpfsug main discussion list Date: Friday 27 July 2018 at 17:19 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Power9 / GPFS I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From gcorneau at us.ibm.com Fri Jul 27 17:02:42 2018 From: gcorneau at us.ibm.com (Glen Corneau) Date: Fri, 27 Jul 2018 11:02:42 -0500 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: References: Message-ID: Just curious, do you have the zStream1 patches installed? # uname -a Linux ac922a.pvw.ibm.com 4.14.0-49.2.2.el7a.ppc64le #1 SMP Fri Apr 27 15:37:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ------------------ Glen Corneau Cognitive Systems Washington Systems Center gcorneau at us.ibm.com From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 07/27/2018 10:19 AM Subject: [gpfsug-discuss] Power9 / GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 26117 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Fri Jul 27 17:05:37 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 27 Jul 2018 16:05:37 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: References: Message-ID: <23C90277-1A82-4802-9902-6BB7149B4563@bham.ac.uk> # uname -a Linux localhost.localdomain 4.14.0-49.el7a.ppc64le #1 SMP Wed Mar 14 13:58:40 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux Its literally out of the box ? Simon From: on behalf of "gcorneau at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 17:03 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS Just curious, do you have the zStream1 patches installed? # uname -a Linux ac922a.pvw.ibm.com 4.14.0-49.2.2.el7a.ppc64le #1 SMP Fri Apr 27 15:37:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ------------------ Glen Corneau Cognitive Systems Washington Systems Center gcorneau at us.ibm.com [cid:_2_DC560798DC56051000576CD7862582D7] From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 07/27/2018 10:19 AM Subject: [gpfsug-discuss] Power9 / GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 26118 bytes Desc: image001.jpg URL: From heiner.billich at psi.ch Fri Jul 27 17:50:17 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Fri, 27 Jul 2018 16:50:17 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: Hello, So probably I was wrong from the beginning ? please can somebody clarify: In a multicluster environment with all storage and filesystem hosted by a single cluster all token managers will reside in this central cluster? Or are there also token managers in the storage-less clusters which just mount? This managers wouldn?t be accessible by all nodes which access the file system, hence I doubt this exists. Still it would be nice to know how to influence the token manager placement and how to exclude certain machines. And the output of ?mmdiag ?tokenmgr? indicates that there _are_ token manager in the remote-mounting cluster ? confusing. I would greatly appreciate if somebody could sort this out. A point to the relevant documentation would also be welcome. Thank you & Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of "Billich Heinrich Rainer (PSI)" Reply-To: gpfsug main discussion list Date: Friday 27 July 2018 at 17:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Thank you, The cluster was freshly set up and the VM node never was denoted as manager, it was created as quorum-client. What I didn?t mention but probably should have: This is a multicluster mount, the cluster has no own storage. Hence the filesystem manager are on the home cluster, according to mmlsmgr. Hm, probably more complicated as I initially thought. Still I would expect that for file-access that is restricted to this cluster all token management is handled inside the cluster, too? And I don?t want the weakest node to participate. Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of Bryan Banister Reply-To: gpfsug main discussion list Date: Tuesday 24 July 2018 at 23:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Agree with Peter here. And if the file system and workload are of significant size then isolating the token manager to a dedicated node is definitely best practice. Unfortunately there isn?t a way to specify a preferred manager per FS? (Bryan starts typing up a new RFE?). Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peter Childs Sent: Tuesday, July 24, 2018 2:29 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Jul 27 18:09:56 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 27 Jul 2018 17:09:56 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: <40989560bbc0448896e0301407388790@jumptrading.com> Yes, the token managers will reside on the NSD Server Cluster which has the NSD Servers that provide access to the underlying data and metadata storage. I believe that all nodes that have the ?manager? designation will participate in the token management operations as needed. Though there is not a way to specify which node will be assigned the primary file system manager or overall cluster manager, which are two different roles but may reside on the same node. Tokens themselves, however, are distributed and managed by clients directly. When a file is first opened then the node that opened the file will be the ?metanode? for the file, and all metadata updates on the file will be handled by this metanode until it closes the file handle, in which case another node will become the ?metanode?. For byte range locking, the file system manager will handle revoking tokens from nodes that have a byte range lock when another node requests access to the same byte range region. This ensures that nodes cannot hold byte range locks that prevent other nodes from accessing byte range regions of a file. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Billich Heinrich Rainer (PSI) Sent: Friday, July 27, 2018 11:50 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ Hello, So probably I was wrong from the beginning ? please can somebody clarify: In a multicluster environment with all storage and filesystem hosted by a single cluster all token managers will reside in this central cluster? Or are there also token managers in the storage-less clusters which just mount? This managers wouldn?t be accessible by all nodes which access the file system, hence I doubt this exists. Still it would be nice to know how to influence the token manager placement and how to exclude certain machines. And the output of ?mmdiag ?tokenmgr? indicates that there _are_ token manager in the remote-mounting cluster ? confusing. I would greatly appreciate if somebody could sort this out. A point to the relevant documentation would also be welcome. Thank you & Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: > on behalf of "Billich Heinrich Rainer (PSI)" > Reply-To: gpfsug main discussion list > Date: Friday 27 July 2018 at 17:33 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] control which hosts become token manager Thank you, The cluster was freshly set up and the VM node never was denoted as manager, it was created as quorum-client. What I didn?t mention but probably should have: This is a multicluster mount, the cluster has no own storage. Hence the filesystem manager are on the home cluster, according to mmlsmgr. Hm, probably more complicated as I initially thought. Still I would expect that for file-access that is restricted to this cluster all token management is handled inside the cluster, too? And I don?t want the weakest node to participate. Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: > on behalf of Bryan Banister > Reply-To: gpfsug main discussion list > Date: Tuesday 24 July 2018 at 23:12 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] control which hosts become token manager Agree with Peter here. And if the file system and workload are of significant size then isolating the token manager to a dedicated node is definitely best practice. Unfortunately there isn?t a way to specify a preferred manager per FS? (Bryan starts typing up a new RFE?). Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Peter Childs Sent: Tuesday, July 24, 2018 2:29 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Jul 27 18:31:46 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 27 Jul 2018 12:31:46 -0500 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: Only nodes in the home cluster will participate as token managers. Note that "mmdiag --tokenmgr" lists all potential token manager nodes, but there will be additional information for the nodes that are currently appointed. --tokenmgr Displays information about token management. For each mounted GPFS file system, one or more token manager nodes is appointed. The first token manager is always colocated with the file system manager, while other token managers can be appointed from the pool of nodes with the manager designation. The information that is shown here includes the list of currently appointed token manager nodes and, if the current node is serving as a token manager, some statistics about prior token transactions. Regards, The Spectrum Scale (GPFS) team -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Jul 27 19:27:19 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 27 Jul 2018 20:27:19 +0200 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data In-Reply-To: <83A6EEB0EC738F459A39439733AE80452672ADC8@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE80452672ADC8@MBX114.d.ethz.ch> Message-ID: Hi, as there are more often similar questions rising, we just put an article about the topic on the Spectrum Scale Wiki https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20 (GPFS)/page/Downsampling%2C%20Upsampling%20and%20Aggregation%20of%20the%20performance%20data While there will be some minor updates on the article in the next time, it might already explain your questions. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 13.07.2018 12:08 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes. I've the following perfmon configuration for the metric-group GPFSNSDDisk: { name = "GPFSNSDDisk" period = 2 restrict = "nsdNodes" }, that, as far as I know sends data to the collector every 2 seconds (correct ?). But how ? does it send what it reads from the counter every two seconds ? or does it aggregated in some way ? or what else ? In the collector node pmcollector, grafana-bridge and grafana-server run. Now I need to understand how to play with the grafana parameters: - Down sample (or Disable downsampling) - Aggregator (following on the same row the metrics). See attached picture 4s.png as reference. In the past I had the period set to 1. And grafana used to display correct data (bytes/s for the metric gpfs_nsdds_bytes_written) with aggregator set to "sum", which AFAIK means "sum all that metrics that match the filter below" (again see the attached picture to see how the filter is set to only collect data from the IO nodes). Today I've changed to "period=2"... and grafana started to display funny data rate (the double, or quad of the real rate). I had to play (almost randomly) with "Aggregator" (from sum to avg, which as fas as I undestand doesn't mean anything in my case... average between the two IO nodes ? or what ?) and "Down sample" (from empty to 2s, and then to 4s) to get back real data rate which is compliant with what I do get with dstat. Can someone kindly explain how to play with these parameters when zimon sensor's period is changed ? Many thanks in advance Regards, Alvise Dorigo[attachment "4s.png" deleted by Manfred Haubrich/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Achim.Rehor at de.ibm.com Sat Jul 28 10:16:04 2018 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Sat, 28 Jul 2018 11:16:04 +0200 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jul 30 16:27:28 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 30 Jul 2018 15:27:28 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: <23C90277-1A82-4802-9902-6BB7149B4563@bham.ac.uk> References: <23C90277-1A82-4802-9902-6BB7149B4563@bham.ac.uk> Message-ID: <24C8CF4A-D0D9-4DC0-B499-6B64D50DF3BC@bham.ac.uk> Just to close the loop on this, this is a bug in the RHEL7.5 first shipped alt kernel for the P9 systems. Patching to a later kernel errata package fixed the issues. I?ve confirmed that upgrading and re-running the installer works fine. Thanks to Julian who contacted me off-list about this. Simon From: on behalf of Simon Thompson Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 17:06 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS # uname -a Linux localhost.localdomain 4.14.0-49.el7a.ppc64le #1 SMP Wed Mar 14 13:58:40 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux Its literally out of the box ? Simon From: on behalf of "gcorneau at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 17:03 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS Just curious, do you have the zStream1 patches installed? # uname -a Linux ac922a.pvw.ibm.com 4.14.0-49.2.2.el7a.ppc64le #1 SMP Fri Apr 27 15:37:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ------------------ Glen Corneau Cognitive Systems Washington Systems Center gcorneau at us.ibm.com [cid:_2_DC560798DC56051000576CD7862582D7] From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 07/27/2018 10:19 AM Subject: [gpfsug-discuss] Power9 / GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 26119 bytes Desc: image001.jpg URL: From Renar.Grunenberg at huk-coburg.de Tue Jul 31 10:03:54 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 31 Jul 2018 09:03:54 +0000 Subject: [gpfsug-discuss] Question about mmsdrrestore Message-ID: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> Hallo All, are there some experiences about the possibility to install/upgrade some existing nodes in a GPFS 4.2.3.x Cluster (OS Rhel6.7) with a fresh OS install to rhel7.5 and reinstall then new GPFS code 5.0.1.1 and do a mmsdrrestore on these node from a 4.2.3 Node. Is it possible, or must we install 4.2.3 Code first, make the mmsdrestore step and then update to 5.0.1.1? Any Hints are appreciate. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Jul 31 10:09:37 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 31 Jul 2018 09:09:37 +0000 Subject: [gpfsug-discuss] Question about mmsdrrestore In-Reply-To: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> References: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> Message-ID: My gut feeling says it?s not possible. If this were me I?d upgrade to 5.0.1.1, make sure it?s working, and then reinstall the node. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Grunenberg, Renar Sent: 31 July 2018 10:04 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Question about mmsdrrestore Hallo All, are there some experiences about the possibility to install/upgrade some existing nodes in a GPFS 4.2.3.x Cluster (OS Rhel6.7) with a fresh OS install to rhel7.5 and reinstall then new GPFS code 5.0.1.1 and do a mmsdrrestore on these node from a 4.2.3 Node. Is it possible, or must we install 4.2.3 Code first, make the mmsdrestore step and then update to 5.0.1.1? Any Hints are appreciate. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jul 31 14:03:52 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 31 Jul 2018 13:03:52 +0000 Subject: [gpfsug-discuss] mmdf vs. df Message-ID: Hallo All, a question whats happening here: We are on GPFS 5.0.1.1 and host a TSM-Server-Cluster. A colleague from me want to add new nsd?s to grow its tsm-storagepool (filedevice class volumes). The tsmpool fs has before 45TB of space after that 128TB. We create new 50 GB tsm-volumes with define volume cmd, but the cmd goes in error after the allocating of 89TB. Following Outputs here: [root at node_a tsmpool]# df -hT Filesystem Type Size Used Avail Use% Mounted on tsmpool gpfs 128T 128T 44G 100% /gpfs/tsmpool root at node_a tsmpool]# mmdf tsmpool --block-size auto disk disk size failure holds holds free free name group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: system (Maximum disk size allowed is 839.99 GB) nsd_r2g8f_tsmpool_001 100G 0 Yes No 88G ( 88%) 10.4M ( 0%) nsd_c4g8f_tsmpool_001 100G 1 Yes No 88G ( 88%) 10.4M ( 0%) nsd_g4_tsmpool 256M 2 No No 0 ( 0%) 0 ( 0%) ------------- -------------------- ------------------- (pool total) 200.2G 176G ( 88%) 20.8M ( 0%) Disks in storage pool: data01 (Maximum disk size allowed is 133.50 TB) nsd_r2g8d_tsmpool_016 8T 0 No Yes 3.208T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_015 8T 0 No Yes 3.205T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_014 8T 0 No Yes 3.208T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_013 8T 0 No Yes 3.206T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_012 8T 0 No Yes 3.208T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_011 8T 0 No Yes 3.205T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_001 8T 0 No Yes 1.48G ( 0%) 14.49M ( 0%) nsd_r2g8d_tsmpool_002 8T 0 No Yes 1.582G ( 0%) 16.12M ( 0%) nsd_r2g8d_tsmpool_003 8T 0 No Yes 1.801G ( 0%) 14.7M ( 0%) nsd_r2g8d_tsmpool_004 8T 0 No Yes 1.629G ( 0%) 15.21M ( 0%) nsd_r2g8d_tsmpool_005 8T 0 No Yes 1.609G ( 0%) 14.22M ( 0%) nsd_r2g8d_tsmpool_006 8T 0 No Yes 1.453G ( 0%) 17.4M ( 0%) nsd_r2g8d_tsmpool_010 8T 0 No Yes 3.208T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_009 8T 0 No Yes 3.197T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_007 8T 0 No Yes 3.194T ( 40%) 7.875M ( 0%) nsd_r2g8d_tsmpool_008 8T 0 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_016 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_006 8T 1 No Yes 888M ( 0%) 21.63M ( 0%) nsd_c4g8d_tsmpool_005 8T 1 No Yes 996M ( 0%) 18.22M ( 0%) nsd_c4g8d_tsmpool_004 8T 1 No Yes 920M ( 0%) 11.21M ( 0%) nsd_c4g8d_tsmpool_003 8T 1 No Yes 984M ( 0%) 14.7M ( 0%) nsd_c4g8d_tsmpool_002 8T 1 No Yes 1.082G ( 0%) 11.89M ( 0%) nsd_c4g8d_tsmpool_001 8T 1 No Yes 1.035G ( 0%) 14.49M ( 0%) nsd_c4g8d_tsmpool_007 8T 1 No Yes 3.281T ( 41%) 7.867M ( 0%) nsd_c4g8d_tsmpool_008 8T 1 No Yes 3.199T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_009 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_010 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_011 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_012 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_013 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_014 8T 1 No Yes 3.195T ( 40%) 7.875M ( 0%) nsd_c4g8d_tsmpool_015 8T 1 No Yes 3.194T ( 40%) 7.867M ( 0%) ------------- -------------------- ------------------- (pool total) 256T 64.09T ( 25%) 341.6M ( 0%) ============= ==================== =================== (data) 256T 64.09T ( 25%) 341.6M ( 0%) (metadata) 200G 176G ( 88%) 20.8M ( 0%) ============= ==================== =================== (total) 256.2T 64.26T ( 25%) 362.4M ( 0%) In GPFS we had already space but the above df seems to be wrong and that make TSM unhappy. If we manually wrote a 50GB File in this FS like: [root at sap00733 tsmpool]# dd if=/dev/zero of=/gpfs/tsmpool/output bs=2M count=25600 25600+0 records in 25600+0 records out 53687091200 bytes (54 GB) copied, 30.2908 s, 1.8 GB/s We see at df level now these: [root at sap00733 tsmpool]# df -hT Filesystem Type Size Used Avail Use% Mounted on tsmpool gpfs 128T 96T 33T 75% /gpfs/tsmpool if we delete these file we see already the first output of 44G free space only. This seems to be the os df Interface seems to be brocken here. What I also must mentioned we use some ignore parameters: root @node_a(rhel7.4)> mmfsadm dump config |grep ignore ignoreNonDioInstCount 0 ! ignorePrefetchLUNCount 1 ignoreReplicaSpaceOnStat 0 ignoreReplicationForQuota 0 ! ignoreReplicationOnStatfs 1 ignoreSync 0 the fs has the -S relatime option. Are there any Known bug here existend ? Any hints on that? Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.utermann at physik.uni-augsburg.de Tue Jul 31 16:02:51 2018 From: ralf.utermann at physik.uni-augsburg.de (Ralf Utermann) Date: Tue, 31 Jul 2018 17:02:51 +0200 Subject: [gpfsug-discuss] Question about mmsdrrestore In-Reply-To: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> References: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> Message-ID: <1de976d6-bc61-b1ff-b953-b28886f8e2c4@physik.uni-augsburg.de> Hi Renar, we reinstalled a previous Debian jessie + GPFS 4.2.3 client to Ubuntu 16.04 + GPFS 5.0.1-1 and did a mmsdrrestore from one of our 4.2.3.8 NSD servers without problems. regards, Ralf On 31.07.2018 11:03, Grunenberg, Renar wrote: > Hallo All, > > ? > > are there some experiences about the possibility to install/upgrade some > existing nodes in a GPFS 4.2.3.x Cluster (OS Rhel6.7) with a fresh OS install to > rhel7.5 and reinstall then new GPFS code 5.0.1.1 > > and do a mmsdrrestore on these node from a 4.2.3 Node. Is it possible, or must > we install 4.2.3 Code first, make the mmsdrestore step and then update to 5.0.1.1? > > Any Hints are appreciate. > > Renar?Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444?Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > > -------------------------------------------------------------------------------- > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands > a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > -------------------------------------------------------------------------------- > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist > nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information in > error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in this > information is strictly forbidden. > -------------------------------------------------------------------------------- > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411 From YARD at il.ibm.com Sun Jul 1 18:12:04 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Sun, 1 Jul 2018 20:12:04 +0300 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Hi Just check : 1) getenfore - Selinux status 2) check if FW is active - iptables -L 3) do u have ping to the host report in mmlscluster ? /etc/hosts valid ? DNS is valid ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Uwe Falke" To: renata at SLAC.STANFORD.EDU, gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Date: 06/28/2018 10:45 AM Subject: Re: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Just some ideas what to try. when you attempted mmdelnode, was that node still active with the IP address known in the cluster? If so, shut it down and try again. Mind the restrictions of mmdelnode though (can't delete NSD servers). Try to fake one of the currently missing cluster nodes, or restore the old system backup to the reinstalled server, if available, or temporarily install gpfs SW there and copy over the GPFS config stuff from a node still active (/var/mmfs/), configure the admin and daemon IFs of the faked node on that machine, then try to start the cluster and see if it comes up with quorum, if it does then go ahead and cleanly de-configure what's needed to remove that node from the cluster gracefully. Once you reach quorum with the remaining nodes you are in safe area. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Renata Maria Dart To: Simon Thompson Cc: gpfsug main discussion list Date: 27/06/2018 21:30 Subject: Re: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Simon, yes I ran mmsdrrestore -p and that helped to create the /var/mmfs/ccr directory which was missing. But it didn't create a ccr.nodes file, so I ended up scp'ng that over by hand which I hope was the right thing to do. The one host that is no longer in service is still in that ccr.nodes file and when I try to mmdelnode it I get: root at ocio-gpu03 renata]# mmdelnode -N dhcp-os-129-164.slac.stanford.edu mmdelnode: Unable to obtain the GPFS configuration file lock. mmdelnode: GPFS was unable to obtain a lock from node dhcp-os-129-164.slac.stanford.edu. mmdelnode: Command failed. Examine previous error messages to determine cause. despite the fact that it doesn't respond to ping. The mmstartup on the newly reinstalled node fails as in my initial email. I should mention that the two "working" nodes are running 4.2.3.4. The person who reinstalled the node that won't start up put on 4.2.3.8. I didn't think that was the cause of this problem though and thought I would try to get the cluster talking again before upgrading the rest of the nodes or degrading the reinstalled one. Thanks, Renata On Wed, 27 Jun 2018, Simon Thompson wrote: >Have you tried running mmsdrestore in the reinstalled node to reads to the cluster and then try and startup gpfs on it? > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1pdg_mmsdrrest.htm > >Simon >________________________________________ >From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Renata Maria Dart [renata at slac.stanford.edu] >Sent: 27 June 2018 19:09 >To: gpfsug-discuss at spectrumscale.org >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving data, > >root at ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine cause. > > >On the reinstalled node, this fails in the same way: > >[root at ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From YARD at il.ibm.com Sun Jul 1 18:17:42 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Sun, 1 Jul 2018 20:17:42 +0300 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events In-Reply-To: References: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> Message-ID: Hi There is was issue with Scale 5.x GUI error - ib_rdma_nic_unrecognized(mlx5_0/2) Check if you have the patch: [root at gssio1 ~]# diff /usr/lpp/mmfs/lib/mmsysmon/NetworkService.py /tmp/NetworkService.py 229c229,230 < recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) --- > #recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) > recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+/\d+\n", mmfsadm)) And restart the - mmsysmoncontrol restart Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Andrew Beattie" To: gpfsug-discuss at spectrumscale.org Date: 06/28/2018 11:16 AM Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Sent by: gpfsug-discuss-bounces at spectrumscale.org Do you know if there is actually a cable plugged into port 2? The system will work fine as long as there is network connectivity, but you may have an issue with redundancy or loss of bandwidth if you do not have every port cabled and configured correctly. Regards Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Date: Thu, Jun 28, 2018 6:08 PM Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root at sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root at sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root at sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root at sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root at sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From oehmes at gmail.com Mon Jul 2 06:26:16 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 2 Jul 2018 07:26:16 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Hi, most traditional raid controllers can't deal well with blocksizes above 4m, which is why the new default is 4m and i would leave it at that unless you know for sure you get better performance with 8mb which typically requires your raid controller volume full block size to be 8mb with maybe a 8+2p @1mb strip size (many people confuse strip size with full track size) . if you don't have dedicated SSDs for metadata i would recommend to just use a 4mb blocksize with mixed data and metadata disks, if you have a reasonable number of SSD's put them in a raid 1 or raid 10 and use them as dedicated metadata and the other disks as dataonly , but i would not use the --metadata-block-size parameter as it prevents the datapool to use large number of subblocks. as long as your SSDs are on raid 1 or 10 there is no read/modify/write penalty, so using them with the 4mb blocksize has no real negative impact at least on controllers i have worked with. hope this helps. On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza wrote: > Hi, it's for a traditional NSD setup. > > --Joey > > On 6/26/18 12:21 AM, Sven Oehme wrote: > > Joseph, > > the subblocksize will be derived from the smallest blocksize in the > filesytem, given you specified a metadata block size of 512k thats what > will be used to calculate the number of subblocks, even your data pool is > 4mb. > is this setup for a traditional NSD Setup or for GNR as the > recommendations would be different. > > sven > > On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza wrote: > >> Quick question, anyone know why GPFS wouldn't respect the default for >> the subblocks-per-full-block parameter when creating a new filesystem? >> I'd expect it to be set to 512 for an 8MB block size but my guess is >> that also specifying a metadata-block-size is interfering with it (by >> being too small). This was a parameter recommended by the vendor for a >> 4.2 installation with metadata on dedicated SSDs in the system pool, any >> best practices for 5.0? I'm guessing I'd have to bump it up to at least >> 4MB to get 512 subblocks for both pools. >> >> fs1 created with: >> # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j >> cluster -n 9000 --metadata-block-size 512K --perfileset-quota >> --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T >> /gpfs/fs1 >> >> # mmlsfs fs1 >> >> >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 8192 Minimum fragment (subblock) >> size in bytes (system pool) >> 131072 Minimum fragment (subblock) >> size in bytes (other pools) >> -i 4096 Inode size in bytes >> -I 32768 Indirect block size in bytes >> >> -B 524288 Block size (system pool) >> 8388608 Block size (other pools) >> >> -V 19.01 (5.0.1.0) File system version >> >> --subblocks-per-full-block 64 Number of subblocks per >> full block >> -P system;DATA Disk storage pools in file >> system >> >> >> Thanks! >> --Joey Mendoza >> NCAR >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mutantllama at gmail.com Mon Jul 2 07:55:07 2018 From: mutantllama at gmail.com (Carl) Date: Mon, 2 Jul 2018 16:55:07 +1000 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Hi Sven, What is the resulting indirect-block size with a 4mb metadata block size? Does the new sub-block magic mean that it will take up 32k, or will it occupy 128k? Cheers, Carl. On Mon, 2 Jul 2018 at 15:26, Sven Oehme wrote: > Hi, > > most traditional raid controllers can't deal well with blocksizes above > 4m, which is why the new default is 4m and i would leave it at that unless > you know for sure you get better performance with 8mb which typically > requires your raid controller volume full block size to be 8mb with maybe a > 8+2p @1mb strip size (many people confuse strip size with full track > size) . > if you don't have dedicated SSDs for metadata i would recommend to just > use a 4mb blocksize with mixed data and metadata disks, if you have a > reasonable number of SSD's put them in a raid 1 or raid 10 and use them as > dedicated metadata and the other disks as dataonly , but i would not use > the --metadata-block-size parameter as it prevents the datapool to use > large number of subblocks. > as long as your SSDs are on raid 1 or 10 there is no read/modify/write > penalty, so using them with the 4mb blocksize has no real negative impact > at least on controllers i have worked with. > > hope this helps. > > On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza wrote: > >> Hi, it's for a traditional NSD setup. >> >> --Joey >> >> On 6/26/18 12:21 AM, Sven Oehme wrote: >> >> Joseph, >> >> the subblocksize will be derived from the smallest blocksize in the >> filesytem, given you specified a metadata block size of 512k thats what >> will be used to calculate the number of subblocks, even your data pool is >> 4mb. >> is this setup for a traditional NSD Setup or for GNR as the >> recommendations would be different. >> >> sven >> >> On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza wrote: >> >>> Quick question, anyone know why GPFS wouldn't respect the default for >>> the subblocks-per-full-block parameter when creating a new filesystem? >>> I'd expect it to be set to 512 for an 8MB block size but my guess is >>> that also specifying a metadata-block-size is interfering with it (by >>> being too small). This was a parameter recommended by the vendor for a >>> 4.2 installation with metadata on dedicated SSDs in the system pool, any >>> best practices for 5.0? I'm guessing I'd have to bump it up to at least >>> 4MB to get 512 subblocks for both pools. >>> >>> fs1 created with: >>> # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j >>> cluster -n 9000 --metadata-block-size 512K --perfileset-quota >>> --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T >>> /gpfs/fs1 >>> >>> # mmlsfs fs1 >>> >>> >>> flag value description >>> ------------------- ------------------------ >>> ----------------------------------- >>> -f 8192 Minimum fragment (subblock) >>> size in bytes (system pool) >>> 131072 Minimum fragment (subblock) >>> size in bytes (other pools) >>> -i 4096 Inode size in bytes >>> -I 32768 Indirect block size in bytes >>> >>> -B 524288 Block size (system pool) >>> 8388608 Block size (other pools) >>> >>> -V 19.01 (5.0.1.0) File system version >>> >>> --subblocks-per-full-block 64 Number of subblocks per >>> full block >>> -P system;DATA Disk storage pools in file >>> system >>> >>> >>> Thanks! >>> --Joey Mendoza >>> NCAR >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Jul 2 08:46:25 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 2 Jul 2018 09:46:25 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Jul 2 08:55:10 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 2 Jul 2018 09:55:10 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Olaf, he is talking about indirect size not subblock size . Carl, here is a screen shot of a 4mb filesystem : [root at p8n15hyp ~]# mmlsfs all_local File system attributes for /dev/fs2-4m-07: ========================================== flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 512 Estimated number of nodes that will mount file system -B 4194304 Block size -Q none Quotas accounting enabled none Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Mon Jun 18 12:30:54 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 4000000000 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d RG001VS001;RG002VS001;RG003VS002;RG004VS002 Disks in file system -A no Automatic mount option -o none Additional mount options -T /gpfs/fs2-4m-07 Default mount point --mount-priority 0 Mount priority as you can see indirect size is 32k sven On Mon, Jul 2, 2018 at 9:46 AM Olaf Weiser wrote: > HI Carl, > 8k for 4 M Blocksize > files < ~3,x KB fits into the inode , for "larger" files (> 3,x KB) at > least one "subblock" be allocated .. > > in R < 5.x ... it was fixed 1/32 from blocksize so subblocksize is > retrieved from the blocksize ... > since R >5 (so new created file systems) .. the new default block size is > 4 MB, fragment size is 8k (512 subblocks) > for even larger block sizes ... more subblocks are available per block > so e.g. > 8M .... 1024 subblocks (fragment size is 8 k again) > > @Sven.. correct me, if I'm wrong ... > > > > > > > From: Carl > > To: gpfsug main discussion list > Date: 07/02/2018 08:55 AM > Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Sven, > > What is the resulting indirect-block size with a 4mb metadata block size? > > Does the new sub-block magic mean that it will take up 32k, or will it > occupy 128k? > > Cheers, > > Carl. > > > On Mon, 2 Jul 2018 at 15:26, Sven Oehme <*oehmes at gmail.com* > > wrote: > Hi, > > most traditional raid controllers can't deal well with blocksizes above > 4m, which is why the new default is 4m and i would leave it at that unless > you know for sure you get better performance with 8mb which typically > requires your raid controller volume full block size to be 8mb with maybe a > 8+2p @1mb strip size (many people confuse strip size with full track size) . > if you don't have dedicated SSDs for metadata i would recommend to just > use a 4mb blocksize with mixed data and metadata disks, if you have a > reasonable number of SSD's put them in a raid 1 or raid 10 and use them as > dedicated metadata and the other disks as dataonly , but i would not use > the --metadata-block-size parameter as it prevents the datapool to use > large number of subblocks. > as long as your SSDs are on raid 1 or 10 there is no read/modify/write > penalty, so using them with the 4mb blocksize has no real negative impact > at least on controllers i have worked with. > > hope this helps. > > On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza <*jam at ucar.edu* > > wrote: > Hi, it's for a traditional NSD setup. > > --Joey > > > On 6/26/18 12:21 AM, Sven Oehme wrote: > Joseph, > > the subblocksize will be derived from the smallest blocksize in the > filesytem, given you specified a metadata block size of 512k thats what > will be used to calculate the number of subblocks, even your data pool is > 4mb. > is this setup for a traditional NSD Setup or for GNR as the > recommendations would be different. > > sven > > On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza <*jam at ucar.edu* > > wrote: > Quick question, anyone know why GPFS wouldn't respect the default for > the subblocks-per-full-block parameter when creating a new filesystem? > I'd expect it to be set to 512 for an 8MB block size but my guess is > that also specifying a metadata-block-size is interfering with it (by > being too small). This was a parameter recommended by the vendor for a > 4.2 installation with metadata on dedicated SSDs in the system pool, any > best practices for 5.0? I'm guessing I'd have to bump it up to at least > 4MB to get 512 subblocks for both pools. > > fs1 created with: > # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j > cluster -n 9000 --metadata-block-size 512K --perfileset-quota > --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 > > # mmlsfs fs1 > > > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment (subblock) > size in bytes (system pool) > 131072 Minimum fragment (subblock) > size in bytes (other pools) > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > > -B 524288 Block size (system pool) > 8388608 Block size (other pools) > > -V 19.01 (5.0.1.0) File system version > > --subblocks-per-full-block 64 Number of subblocks per > full block > -P system;DATA Disk storage pools in file > system > > > Thanks! > --Joey Mendoza > NCAR > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mutantllama at gmail.com Mon Jul 2 10:57:11 2018 From: mutantllama at gmail.com (Carl) Date: Mon, 2 Jul 2018 19:57:11 +1000 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Thanks Olaf and Sven, It looks like a lot of advice from the wiki ( https://www.ibm.com/developerworks/community/wikis/home?lang=en-us#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Data%20and%20Metadata) is no longer relevant for version 5. Any idea if its likely to be updated soon? The new subblock changes appear to have removed a lot of reasons for using smaller block sizes. In broad terms there any situations where you would recommend using less than the new default block size? Cheers, Carl. On Mon, 2 Jul 2018 at 17:55, Sven Oehme wrote: > Olaf, he is talking about indirect size not subblock size . > > Carl, > > here is a screen shot of a 4mb filesystem : > > [root at p8n15hyp ~]# mmlsfs all_local > > File system attributes for /dev/fs2-4m-07: > ========================================== > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment (subblock) > size in bytes > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 1 Default number of metadata > replicas > -M 2 Maximum number of metadata > replicas > -r 1 Default number of data > replicas > -R 2 Maximum number of data > replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in > effect > -k all ACL semantics in effect > -n 512 Estimated number of nodes > that will mount file system > -B 4194304 Block size > -Q none Quotas accounting enabled > none Quotas enforced > none Default quotas enabled > --perfileset-quota No Per-fileset quota enforcement > --filesetdf No Fileset df enabled? > -V 19.01 (5.0.1.0) File system version > --create-time Mon Jun 18 12:30:54 2018 File system creation time > -z No Is DMAPI enabled? > -L 33554432 Logfile size > -E Yes Exact mtime mount option > -S relatime Suppress atime mount option > -K whenpossible Strict replica allocation > option > --fastea Yes Fast external attributes > enabled? > --encryption No Encryption enabled? > --inode-limit 4000000000 Maximum number of inodes > --log-replicas 0 Number of log replicas > --is4KAligned Yes is4KAligned? > --rapid-repair Yes rapidRepair enabled? > --write-cache-threshold 0 HAWC Threshold (max 65536) > --subblocks-per-full-block 512 Number of subblocks per full > block > -P system Disk storage pools in file > system > --file-audit-log No File Audit Logging enabled? > --maintenance-mode No Maintenance Mode enabled? > -d RG001VS001;RG002VS001;RG003VS002;RG004VS002 Disks in > file system > -A no Automatic mount option > -o none Additional mount options > -T /gpfs/fs2-4m-07 Default mount point > --mount-priority 0 Mount priority > > as you can see indirect size is 32k > > sven > > On Mon, Jul 2, 2018 at 9:46 AM Olaf Weiser wrote: > >> HI Carl, >> 8k for 4 M Blocksize >> files < ~3,x KB fits into the inode , for "larger" files (> 3,x KB) at >> least one "subblock" be allocated .. >> >> in R < 5.x ... it was fixed 1/32 from blocksize so subblocksize is >> retrieved from the blocksize ... >> since R >5 (so new created file systems) .. the new default block size is >> 4 MB, fragment size is 8k (512 subblocks) >> for even larger block sizes ... more subblocks are available per block >> so e.g. >> 8M .... 1024 subblocks (fragment size is 8 k again) >> >> @Sven.. correct me, if I'm wrong ... >> >> >> >> >> >> >> From: Carl >> >> To: gpfsug main discussion list >> Date: 07/02/2018 08:55 AM >> Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> ------------------------------ >> >> >> >> Hi Sven, >> >> What is the resulting indirect-block size with a 4mb metadata block size? >> >> Does the new sub-block magic mean that it will take up 32k, or will it >> occupy 128k? >> >> Cheers, >> >> Carl. >> >> >> On Mon, 2 Jul 2018 at 15:26, Sven Oehme <*oehmes at gmail.com* >> > wrote: >> Hi, >> >> most traditional raid controllers can't deal well with blocksizes above >> 4m, which is why the new default is 4m and i would leave it at that unless >> you know for sure you get better performance with 8mb which typically >> requires your raid controller volume full block size to be 8mb with maybe a >> 8+2p @1mb strip size (many people confuse strip size with full track size) . >> if you don't have dedicated SSDs for metadata i would recommend to just >> use a 4mb blocksize with mixed data and metadata disks, if you have a >> reasonable number of SSD's put them in a raid 1 or raid 10 and use them as >> dedicated metadata and the other disks as dataonly , but i would not use >> the --metadata-block-size parameter as it prevents the datapool to use >> large number of subblocks. >> as long as your SSDs are on raid 1 or 10 there is no read/modify/write >> penalty, so using them with the 4mb blocksize has no real negative impact >> at least on controllers i have worked with. >> >> hope this helps. >> >> On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza <*jam at ucar.edu* >> > wrote: >> Hi, it's for a traditional NSD setup. >> >> --Joey >> >> >> On 6/26/18 12:21 AM, Sven Oehme wrote: >> Joseph, >> >> the subblocksize will be derived from the smallest blocksize in the >> filesytem, given you specified a metadata block size of 512k thats what >> will be used to calculate the number of subblocks, even your data pool is >> 4mb. >> is this setup for a traditional NSD Setup or for GNR as the >> recommendations would be different. >> >> sven >> >> On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza <*jam at ucar.edu* >> > wrote: >> Quick question, anyone know why GPFS wouldn't respect the default for >> the subblocks-per-full-block parameter when creating a new filesystem? >> I'd expect it to be set to 512 for an 8MB block size but my guess is >> that also specifying a metadata-block-size is interfering with it (by >> being too small). This was a parameter recommended by the vendor for a >> 4.2 installation with metadata on dedicated SSDs in the system pool, any >> best practices for 5.0? I'm guessing I'd have to bump it up to at least >> 4MB to get 512 subblocks for both pools. >> >> fs1 created with: >> # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j >> cluster -n 9000 --metadata-block-size 512K --perfileset-quota >> --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T >> /gpfs/fs1 >> >> # mmlsfs fs1 >> >> >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 8192 Minimum fragment (subblock) >> size in bytes (system pool) >> 131072 Minimum fragment (subblock) >> size in bytes (other pools) >> -i 4096 Inode size in bytes >> -I 32768 Indirect block size in bytes >> >> -B 524288 Block size (system pool) >> 8388608 Block size (other pools) >> >> -V 19.01 (5.0.1.0) File system version >> >> --subblocks-per-full-block 64 Number of subblocks per >> full block >> -P system;DATA Disk storage pools in file >> system >> >> >> Thanks! >> --Joey Mendoza >> NCAR >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lore at cscs.ch Mon Jul 2 14:50:37 2018 From: lore at cscs.ch (Lo Re Giuseppe) Date: Mon, 2 Jul 2018 13:50:37 +0000 Subject: [gpfsug-discuss] Zimon metrics details Message-ID: <89EC4307-DDE4-42FD-B73A-12F79A3BA22F@cscs.ch> Hi everybody, I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? The SS probelm determination guide doens?t spend more than half a line for each. In particular I would like to understand the difference between these ones: - gpfs_fs_bytes_read - gpfs_fis_bytes_read The second gives tipically higher values than the first one. Thanks for any hit. Regards, Giuseppe *********************************************************************** Giuseppe Lo Re CSCS - Swiss National Supercomputing Center Via Trevano 131 CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 Switzerland Email: giuseppe.lore at cscs.ch *********************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Mon Jul 2 15:04:39 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 2 Jul 2018 07:04:39 -0700 Subject: [gpfsug-discuss] Zimon metrics details In-Reply-To: <89EC4307-DDE4-42FD-B73A-12F79A3BA22F@cscs.ch> References: <89EC4307-DDE4-42FD-B73A-12F79A3BA22F@cscs.ch> Message-ID: <523F9FE0-CA7D-4655-AFC5-BEBC1F56FC34@lbl.gov> +1 Would love to see more detailed descriptions on Zimon metrics. Sent from my iPhone > On Jul 2, 2018, at 6:50 AM, Lo Re Giuseppe wrote: > > Hi everybody, > > I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. > Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) > > Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? > The SS probelm determination guide doens?t spend more than half a line for each. > > In particular I would like to understand the difference between these ones: > > - gpfs_fs_bytes_read > - gpfs_fis_bytes_read > > The second gives tipically higher values than the first one. > > Thanks for any hit. > > Regards, > > Giuseppe > > *********************************************************************** > > Giuseppe Lo Re > > CSCS - Swiss National Supercomputing Center > > Via Trevano 131 > > CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 > > Switzerland Email: giuseppe.lore at cscs.ch > > *********************************************************************** > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From agar at us.ibm.com Mon Jul 2 16:05:33 2018 From: agar at us.ibm.com (Eric Agar) Date: Mon, 2 Jul 2018 11:05:33 -0400 Subject: [gpfsug-discuss] Zimon metrics details In-Reply-To: <523F9FE0-CA7D-4655-AFC5-BEBC1F56FC34@lbl.gov> References: <89EC4307-DDE4-42FD-B73A-12F79A3BA22F@cscs.ch> <523F9FE0-CA7D-4655-AFC5-BEBC1F56FC34@lbl.gov> Message-ID: Hello Giuseppe, Following was my attempt to answer a similar question some months ago. When reading about the different viewpoints of the Zimon sensors, please note that gpfs_fis_bytes_read is a metric provided by the GPFSFileSystemAPI sensor, while gpfs_fs_bytes_read is a metric provided by the GPFSFileSystem sensor. Therefore, gpfs_fis_bytes_read reflects application reads, while gpfs_fs_bytes_read reflects NSD reads. The GPFSFileSystemAPI and GPFSNodeAPI sensor metrics are from the point of view of "applications" in the sense that they provide stats about I/O requests made to files in GPFS file systems from user level applications using POSIX interfaces like open(), close(), read(), write(), etc. This is in contrast to similarly named sensors without the "API" suffix, like GPFSFilesystem and GPFSNode. Those sensors provide stats about I/O requests made by the GPFS code to NSDs (disks) making up GPFS file systems. The relationship between application I/O and disk I/O might or might not be obvious. Consider some examples. An application that starts sequentially reading a file might, at least initially, cause more disk I/O than expected because GPFS has decided to prefetch data. An application write() might not immediately cause the writing of disk blocks, due to the operation of the pagepool. Ultimately, application write()s might cause twice as much data written to disk due to the replication factor of the file system. Application I/O concerns itself with user data; disk I/O might have to occur to handle the user data and associated file system metadata (like inodes and indirect blocks). The difference between GPFSFileSystemAPI and GPFSNodeAPI: GPFSFileSystemAPI reports stats for application I/O per filesystem per node; GPFSNodeAPI reports application I/O stats per node. Similarly, GPFSFilesystem reports stats for disk I/O per filesystem per node; GPFSNode reports disk I/O stats per node. Eric M. Agar agar at us.ibm.com IBM Spectrum Scale Level 2 Software Defined Infrastructure, IBM Systems From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 07/02/2018 10:06 AM Subject: Re: [gpfsug-discuss] Zimon metrics details Sent by: gpfsug-discuss-bounces at spectrumscale.org +1 Would love to see more detailed descriptions on Zimon metrics. Sent from my iPhone On Jul 2, 2018, at 6:50 AM, Lo Re Giuseppe wrote: Hi everybody, I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? The SS probelm determination guide doens?t spend more than half a line for each. In particular I would like to understand the difference between these ones: - gpfs_fs_bytes_read - gpfs_fis_bytes_read The second gives tipically higher values than the first one. Thanks for any hit. Regards, Giuseppe *********************************************************************** Giuseppe Lo Re CSCS - Swiss National Supercomputing Center Via Trevano 131 CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 Switzerland Email: giuseppe.lore at cscs.ch *********************************************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From sandeep.patil at in.ibm.com Mon Jul 2 19:43:20 2018 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Mon, 2 Jul 2018 18:43:20 +0000 Subject: [gpfsug-discuss] Latest Technical Blogs on Spectrum Scale (Q2 2018) In-Reply-To: References: Message-ID: Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q2 2018). We now have over 100+ developer blogs. As discussed in User Groups, passing it along: IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ IBM Spectrum Scale ILM Policies https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/ IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ Management GUI enhancements in IBM Spectrum Scale release 5.0.1 https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/ Managing IBM Spectrum Scale services through GUI https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/ Use AWS CLI with IBM Spectrum Scale? object storage https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/ Hadoop Storage Tiering with IBM Spectrum Scale https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/ How many Files on my Filesystem? https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/ Recording Spectrum Scale Object Stats for Potential Billing like Purpose using Elasticsearch https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/ New features in IBM Elastic Storage Server (ESS) Version 5.3 https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/ Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send earlier) https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19 Redpapers Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/abstracts/redp5411.html?Open SAP HANA and ESS: A Winning Combination (Update) http://www.redbooks.ibm.com/abstracts/redp5436.html?Open Others IBM Spectrum Scale Software Version Recommendation Preventive Service Planning (Updated) http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in HCLS https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN& For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 03/27/2018 05:23 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along: GDPR Compliance and Unstructured Data Storage https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/ IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/ Management GUI enhancements in IBM Spectrum Scale release 5.0.0 https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/ IBM Spectrum Scale 5.0.0 ? What?s new in NFS? https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/ Benefits and implementation of Spectrum Scale sudo wrappers https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/ IBM Spectrum Scale: Big Data and Analytics Solution Brief https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/ Variant Sub-blocks in Spectrum Scale 5.0 https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/ Compression support in Spectrum Scale 5.0.0 https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4 Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY. For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 01/10/2018 12:13 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, Here are list of development blogs in the last quarter. Passing it to this email group as Doris had got a feedback in the UG meetings to notify the members with the latest updates periodically. Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ IBM Spectrum Scale MMFSCK ? Savvy Enhancements https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/ ESS Disk Management https://developer.ibm.com/storage/2018/01/02/ess-disk-management/ IBM Spectrum Scale Object Protocol On Ubuntu https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/ IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/ A Complete Guide to ? Protocol Problem Determination Guide for IBM Spectrum Scale? ? Part 1 https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/ IBM Spectrum Scale installation toolkit ? enhancements over releases https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/ Network requirements in an Elastic Storage Server Setup https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/ Co-resident migration with Transparent cloud tierin https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/ IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/ Big data analytics with Spectrum Scale using remote cluster mount & multi-filesystem support https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/ IBM Spectrum Scale HDFS Transparency Short Circuit Write Support https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/ IBM Spectrum Scale HDFS Transparency Federation Support https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/ How to configure and performance tuning different system workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning database workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Hadoop workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/ How to Configure IBM Spectrum Scale? with NIS based Authentication. https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 11/16/2017 08:15 PM Subject: Latest Technical Blogs on Spectrum Scale Dear User Group members, Here are the Development Blogs in last 3 months on Spectrum Scale Technical Topics. Spectrum Scale Monitoring ? Know More ? https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/ IBM Spectrum Scale 5.0 Release ? What?s coming ! https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/ Four Essentials things to know for managing data ACLs on IBM Spectrum Scale? from Windows https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/ GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server https://developer.ibm.com/storage/2017/11/13/gssutils/ IBM Spectrum Scale Object Authentication https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/ Video Surveillance ? Choosing the right storage https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/ IBM Spectrum scale object deep dive training with problem determination https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training Spectrum Scale as preferred software defined storage for Ubuntu OpenStack https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/ IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a performance workhorse https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/ A Complete Guide to Configure LDAP-based authentication with IBM Spectrum Scale? for File Access https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/ Deploying IBM Spectrum Scale on AWS Quick Start https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/ Monitoring Spectrum Scale Object metrics https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/ Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk Universal https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/ Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol on IBM Spectrum Scale?? https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory and LDAP https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/ IBM Spectrum Scale? Authentication using Active Directory and RFC2307 https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/ High Availability Implementation with IBM Spectrum Virtualize and IBM Spectrum Scale https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/ 10 Frequently asked Questions on configuring Authentication using AD + AUTO ID mapping on IBM Spectrum Scale?. https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/ Five cool things that you didn?t know Transparent Cloud Tiering on Spectrum Scale can do https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/ IBM Spectrum Scale GUI videos https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/ IBM Spectrum Scale? Authentication ? Planning for NFS Access https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Mon Jul 2 21:17:26 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Mon, 2 Jul 2018 22:17:26 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Hi, Carl, Sven had mentioned the RMW penalty before which could make it beneficial to use smaller blocks. If you have traditional RAIDs and you go the usual route to do track sizes equal to the block size (stripe size = BS/n with n+p RAIDs), you may run into problems if your I/O are typically or very often smaller than a block because the controller needs to read the entire track, modifies it according to your I/O, and writes it back with the parity stripes. Example: with 4MiB BS and 8+2 RAIDS as NSDs, on each I/O smaller than 4MiB reaching an NSD the controller needs to read 4MiB into a buffer, modify it according to your I/O, calculate parity for the whole track and write back 5MiB (8 data stripes of 512kiB plus two parity stripes). In those cases you might be better off with smaller block sizes. In the above scenario, it might however still be ok to leave the block size at 4MiB and just reduce the track size of the RAIDs. One has to check how that affects performance, YMMV I'd say here. Mind that the ESS uses a clever way to mask these type of I/O from the n+p RS based vdisks, but even there one might need to think ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Carl To: gpfsug main discussion list Date: 02/07/2018 11:57 Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Olaf and Sven, It looks like a lot of advice from the wiki ( https://www.ibm.com/developerworks/community/wikis/home?lang=en-us#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Data%20and%20Metadata ) is no longer relevant for version 5. Any idea if its likely to be updated soon? The new subblock changes appear to have removed a lot of reasons for using smaller block sizes. In broad terms there any situations where you would recommend using less than the new default block size? Cheers, Carl. On Mon, 2 Jul 2018 at 17:55, Sven Oehme wrote: Olaf, he is talking about indirect size not subblock size . Carl, here is a screen shot of a 4mb filesystem : [root at p8n15hyp ~]# mmlsfs all_local File system attributes for /dev/fs2-4m-07: ========================================== flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 512 Estimated number of nodes that will mount file system -B 4194304 Block size -Q none Quotas accounting enabled none Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Mon Jun 18 12:30:54 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 4000000000 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d RG001VS001;RG002VS001;RG003VS002;RG004VS002 Disks in file system -A no Automatic mount option -o none Additional mount options -T /gpfs/fs2-4m-07 Default mount point --mount-priority 0 Mount priority as you can see indirect size is 32k sven On Mon, Jul 2, 2018 at 9:46 AM Olaf Weiser wrote: HI Carl, 8k for 4 M Blocksize files < ~3,x KB fits into the inode , for "larger" files (> 3,x KB) at least one "subblock" be allocated .. in R < 5.x ... it was fixed 1/32 from blocksize so subblocksize is retrieved from the blocksize ... since R >5 (so new created file systems) .. the new default block size is 4 MB, fragment size is 8k (512 subblocks) for even larger block sizes ... more subblocks are available per block so e.g. 8M .... 1024 subblocks (fragment size is 8 k again) @Sven.. correct me, if I'm wrong ... From: Carl To: gpfsug main discussion list Date: 07/02/2018 08:55 AM Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Sven, What is the resulting indirect-block size with a 4mb metadata block size? Does the new sub-block magic mean that it will take up 32k, or will it occupy 128k? Cheers, Carl. On Mon, 2 Jul 2018 at 15:26, Sven Oehme wrote: Hi, most traditional raid controllers can't deal well with blocksizes above 4m, which is why the new default is 4m and i would leave it at that unless you know for sure you get better performance with 8mb which typically requires your raid controller volume full block size to be 8mb with maybe a 8+2p @1mb strip size (many people confuse strip size with full track size) . if you don't have dedicated SSDs for metadata i would recommend to just use a 4mb blocksize with mixed data and metadata disks, if you have a reasonable number of SSD's put them in a raid 1 or raid 10 and use them as dedicated metadata and the other disks as dataonly , but i would not use the --metadata-block-size parameter as it prevents the datapool to use large number of subblocks. as long as your SSDs are on raid 1 or 10 there is no read/modify/write penalty, so using them with the 4mb blocksize has no real negative impact at least on controllers i have worked with. hope this helps. On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza wrote: Hi, it's for a traditional NSD setup. --Joey On 6/26/18 12:21 AM, Sven Oehme wrote: Joseph, the subblocksize will be derived from the smallest blocksize in the filesytem, given you specified a metadata block size of 512k thats what will be used to calculate the number of subblocks, even your data pool is 4mb. is this setup for a traditional NSD Setup or for GNR as the recommendations would be different. sven On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza wrote: Quick question, anyone know why GPFS wouldn't respect the default for the subblocks-per-full-block parameter when creating a new filesystem? I'd expect it to be set to 512 for an 8MB block size but my guess is that also specifying a metadata-block-size is interfering with it (by being too small). This was a parameter recommended by the vendor for a 4.2 installation with metadata on dedicated SSDs in the system pool, any best practices for 5.0? I'm guessing I'd have to bump it up to at least 4MB to get 512 subblocks for both pools. fs1 created with: # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j cluster -n 9000 --metadata-block-size 512K --perfileset-quota --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 # mmlsfs fs1 flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 131072 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -B 524288 Block size (system pool) 8388608 Block size (other pools) -V 19.01 (5.0.1.0) File system version --subblocks-per-full-block 64 Number of subblocks per full block -P system;DATA Disk storage pools in file system Thanks! --Joey Mendoza NCAR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From lore at cscs.ch Tue Jul 3 09:05:41 2018 From: lore at cscs.ch (Lo Re Giuseppe) Date: Tue, 3 Jul 2018 08:05:41 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 78, Issue 6 In-Reply-To: References: Message-ID: Dear Eric, thanks a lot for this information. And what about the gpfs_vfs metric group? What is the difference beteween for example ?gpfs_fis_read_calls" and ?gpfs_vfs_read? ? Again I see the second one being tipically higher than the first one. In addition gpfs_vfs_read is not related to a specific file system... [root at ela5 ~]# mmperfmon query gpfs_fis_read_calls -n1 -b 60 Legend: 1: ela5.cscs.ch|GPFSFilesystemAPI|durand.cscs.ch|store|gpfs_fis_read_calls 2: ela5.cscs.ch|GPFSFilesystemAPI|por.login.cscs.ch|apps|gpfs_fis_read_calls 3: ela5.cscs.ch|GPFSFilesystemAPI|por.login.cscs.ch|project|gpfs_fis_read_calls 4: ela5.cscs.ch|GPFSFilesystemAPI|por.login.cscs.ch|users|gpfs_fis_read_calls Row Timestamp gpfs_fis_read_calls gpfs_fis_read_calls gpfs_fis_read_calls gpfs_fis_read_calls 1 2018-07-03-10:03:00 0 0 7274 0 [root at ela5 ~]# mmperfmon query gpfs_vfs_read -n1 -b 60 Legend: 1: ela5.cscs.ch|GPFSVFS|gpfs_vfs_read Row Timestamp gpfs_vfs_read 1 2018-07-03-10:03:00 45123 Cheers, Giuseppe *********************************************************************** Giuseppe Lo Re CSCS - Swiss National Supercomputing Center Via Trevano 131 CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 Switzerland Email: giuseppe.lore at cscs.ch *********************************************************************** Hello Giuseppe, Following was my attempt to answer a similar question some months ago. When reading about the different viewpoints of the Zimon sensors, please note that gpfs_fis_bytes_read is a metric provided by the GPFSFileSystemAPI sensor, while gpfs_fs_bytes_read is a metric provided by the GPFSFileSystem sensor. Therefore, gpfs_fis_bytes_read reflects application reads, while gpfs_fs_bytes_read reflects NSD reads. The GPFSFileSystemAPI and GPFSNodeAPI sensor metrics are from the point of view of "applications" in the sense that they provide stats about I/O requests made to files in GPFS file systems from user level applications using POSIX interfaces like open(), close(), read(), write(), etc. This is in contrast to similarly named sensors without the "API" suffix, like GPFSFilesystem and GPFSNode. Those sensors provide stats about I/O requests made by the GPFS code to NSDs (disks) making up GPFS file systems. The relationship between application I/O and disk I/O might or might not be obvious. Consider some examples. An application that starts sequentially reading a file might, at least initially, cause more disk I/O than expected because GPFS has decided to prefetch data. An application write() might not immediately cause the writing of disk blocks, due to the operation of the pagepool. Ultimately, application write()s might cause twice as much data written to disk due to the replication factor of the file system. Application I/O concerns itself with user data; disk I/O might have to occur to handle the user data and associated file system metadata (like inodes and indirect blocks). The difference between GPFSFileSystemAPI and GPFSNodeAPI: GPFSFileSystemAPI reports stats for application I/O per filesystem per node; GPFSNodeAPI reports application I/O stats per node. Similarly, GPFSFilesystem reports stats for disk I/O per filesystem per node; GPFSNode reports disk I/O stats per node. Eric M. Agar agar at us.ibm.com IBM Spectrum Scale Level 2 Software Defined Infrastructure, IBM Systems From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 07/02/2018 10:06 AM Subject: Re: [gpfsug-discuss] Zimon metrics details Sent by: gpfsug-discuss-bounces at spectrumscale.org +1 Would love to see more detailed descriptions on Zimon metrics. Sent from my iPhone On Jul 2, 2018, at 6:50 AM, Lo Re Giuseppe wrote: Hi everybody, I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? The SS probelm determination guide doens?t spend more than half a line for each. In particular I would like to understand the difference between these ones: - gpfs_fs_bytes_read - gpfs_fis_bytes_read The second gives tipically higher values than the first one. Thanks for any hit. Regards, Giuseppe *********************************************************************** Giuseppe Lo Re CSCS - Swiss National Supercomputing Center Via Trevano 131 CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 Switzerland Email: giuseppe.lore at cscs.ch *********************************************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 78, Issue 6 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: From Cameron.Dunn at bristol.ac.uk Tue Jul 3 12:49:03 2018 From: Cameron.Dunn at bristol.ac.uk (Cameron Dunn) Date: Tue, 3 Jul 2018 11:49:03 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms Message-ID: HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape are then shared by Samba to Macs and PCs. MacOS Finder and Windows Explorer will want to display all the thumbnail images of a folder's contents, which will recall lots of files from tape. According to the Samba documentation this is preventable by setting the following ---------------------------------------------- https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html gpfs:recalls = [ yes | no ] When this option is set to no, an attempt to open an offline file will be rejected with access denied. This helps preventing recall storms triggered by careless applications like Finder and Explorer. yes(default) - Open files that are offline. This will recall the files from HSM. no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. Using this setting also requires gpfs:hsm to be set to yes. gpfs:hsm = [ yes | no ] Enable/Disable announcing if this FS has HSM enabled. no(default) - Do not announce HSM. yes - Announce HSM. -------------------------------------------------- However we could not get this to work. On Centos7/Samba4.5, smb.conf contained gpfs:hsm = yes gpfs:recalls = no (also tried setting gpfs:offline = yes, though this is not documented) We made a share containing image files that were then migrated to tape by LTFS-EE, to see if these flags were respected by OS X Finder or Windows Explorer. Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, so that when browsing the stubs in the share, the files were recalled from tape and the thumbnails displayed. Has anyone seen these flags working as they are supposed to ? Many thanks for any ideas, Cameron Cameron Dunn Advanced Computing Systems Administrator Advanced Computing Research Centre University of Bristol -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Tue Jul 3 20:37:08 2018 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Tue, 3 Jul 2018 19:37:08 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jul 3 17:43:20 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 3 Jul 2018 16:43:20 +0000 Subject: [gpfsug-discuss] High I/O wait times Message-ID: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Jul 3 21:11:17 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 3 Jul 2018 16:11:17 -0400 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: Are you seeing similar values for all the nodes or just some of them? One possible issue is how the NSD queues are configured on the NSD servers. You can see this with the output of "mmfsadm dump nsd". There are queues for LARGE IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the highest pending values to see if many IOs are queueing. There are a couple of options to fix this but rather than explain them I suggest you look for information about NSD queueing on the developerWorks site. There has been information posted there that should prove helpful. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/03/2018 03:49 PM Subject: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jul 3 22:41:17 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 3 Jul 2018 21:41:17 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: <9877F844-6C60-42CD-8902-67C255F3ABD8@vanderbilt.edu> Hi Fred, Thanks for the response. I have been looking at the ?mmfsadm dump nsd? data from the two NSD servers that serve up the two NSDs that most commonly experience high wait times (although, again, this varies from time to time). In addition, I have been reading: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning And: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning Which seem to be the most relevant documents on the Wiki. I would like to do a more detailed analysis of the ?mmfsadm dump nsd? output, but my preliminary looks at it seems to indicate that I see I/O?s queueing in the 50 - 100 range for the small queues and the 60 - 200 range on the large queues. In addition, I am regularly seeing all 12 threads on the LARGE queues active, while it is much more rare that I see all - or even close to all - the threads on the SMALL queues active. As far as the parameters Scott and Yuri mention, on our cluster they are set thusly: [common] nsdMaxWorkerThreads 640 [] nsdMaxWorkerThreads 1024 [common] nsdThreadsPerQueue 4 [] nsdThreadsPerQueue 12 [common] nsdSmallThreadRatio 3 [] nsdSmallThreadRatio 1 So to me it sounds like I need more resources on the LARGE queue side of things ? i.e. it sure doesn?t sound like I want to change my small thread ratio. If I increase the amount of threads it sounds like that might help, but that also takes more pagepool, and I?ve got limited RAM in these (old) NSD servers. I do have nsdbufspace set to 70, but I?ve only got 16-24 GB RAM each in these NSD servers. And a while back I did try increase the page pool on them (very slightly) and ended up causing problems because then they ran out of physical RAM. Thoughts? Followup questions? Thanks! Kevin On Jul 3, 2018, at 3:11 PM, Frederick Stock wrote: Are you seeing similar values for all the nodes or just some of them? One possible issue is how the NSD queues are configured on the NSD servers. You can see this with the output of "mmfsadm dump nsd". There are queues for LARGE IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the highest pending values to see if many IOs are queueing. There are a couple of options to fix this but rather than explain them I suggest you look for information about NSD queueing on the developerWorks site. There has been information posted there that should prove helpful. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/03/2018 03:49 PM Subject: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938076014&sdata=wIyB66HoqvL13I3LX0Ott%2Btr7HQQdInZ028QUp0QMhE%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Jul 3 22:53:19 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 3 Jul 2018 17:53:19 -0400 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <9877F844-6C60-42CD-8902-67C255F3ABD8@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <9877F844-6C60-42CD-8902-67C255F3ABD8@vanderbilt.edu> Message-ID: How many NSDs are served by the NSD servers and what is your maximum file system block size? Have you confirmed that you have sufficient NSD worker threads to handle the maximum number of IOs you are configured to have active? That would be the number of NSDs served times 12 (you have 12 threads per queue). Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/03/2018 05:41 PM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Fred, Thanks for the response. I have been looking at the ?mmfsadm dump nsd? data from the two NSD servers that serve up the two NSDs that most commonly experience high wait times (although, again, this varies from time to time). In addition, I have been reading: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning And: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning Which seem to be the most relevant documents on the Wiki. I would like to do a more detailed analysis of the ?mmfsadm dump nsd? output, but my preliminary looks at it seems to indicate that I see I/O?s queueing in the 50 - 100 range for the small queues and the 60 - 200 range on the large queues. In addition, I am regularly seeing all 12 threads on the LARGE queues active, while it is much more rare that I see all - or even close to all - the threads on the SMALL queues active. As far as the parameters Scott and Yuri mention, on our cluster they are set thusly: [common] nsdMaxWorkerThreads 640 [] nsdMaxWorkerThreads 1024 [common] nsdThreadsPerQueue 4 [] nsdThreadsPerQueue 12 [common] nsdSmallThreadRatio 3 [] nsdSmallThreadRatio 1 So to me it sounds like I need more resources on the LARGE queue side of things ? i.e. it sure doesn?t sound like I want to change my small thread ratio. If I increase the amount of threads it sounds like that might help, but that also takes more pagepool, and I?ve got limited RAM in these (old) NSD servers. I do have nsdbufspace set to 70, but I?ve only got 16-24 GB RAM each in these NSD servers. And a while back I did try increase the page pool on them (very slightly) and ended up causing problems because then they ran out of physical RAM. Thoughts? Followup questions? Thanks! Kevin On Jul 3, 2018, at 3:11 PM, Frederick Stock wrote: Are you seeing similar values for all the nodes or just some of them? One possible issue is how the NSD queues are configured on the NSD servers. You can see this with the output of "mmfsadm dump nsd". There are queues for LARGE IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the highest pending values to see if many IOs are queueing. There are a couple of options to fix this but rather than explain them I suggest you look for information about NSD queueing on the developerWorks site. There has been information posted there that should prove helpful. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/03/2018 03:49 PM Subject: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938076014&sdata=wIyB66HoqvL13I3LX0Ott%2Btr7HQQdInZ028QUp0QMhE%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jul 3 23:05:25 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 3 Jul 2018 22:05:25 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <9877F844-6C60-42CD-8902-67C255F3ABD8@vanderbilt.edu> Message-ID: <2CB5B62E-A40A-4C47-B2D1-137BE87FBDDA@vanderbilt.edu> Hi Fred, I have a total of 48 NSDs served up by 8 NSD servers. 12 of those NSDs are in our small /home filesystem, which is performing just fine. The other 36 are in our ~1 PB /scratch and /data filesystem, which is where the problem is. Our max filesystem block size parameter is set to 16 MB, but the aforementioned filesystem uses a 1 MB block size. nsdMaxWorkerThreads is set to 1024 as shown below. Since each NSD server serves an average of 6 NSDs and 6 x 12 = 72 we?re OK if I?m understanding the calculation correctly. Even multiplying 48 x 12 = 576, so we?re good?!? Your help is much appreciated! Thanks again? Kevin On Jul 3, 2018, at 4:53 PM, Frederick Stock > wrote: How many NSDs are served by the NSD servers and what is your maximum file system block size? Have you confirmed that you have sufficient NSD worker threads to handle the maximum number of IOs you are configured to have active? That would be the number of NSDs served times 12 (you have 12 threads per queue). Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 07/03/2018 05:41 PM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Fred, Thanks for the response. I have been looking at the ?mmfsadm dump nsd? data from the two NSD servers that serve up the two NSDs that most commonly experience high wait times (although, again, this varies from time to time). In addition, I have been reading: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning And: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning Which seem to be the most relevant documents on the Wiki. I would like to do a more detailed analysis of the ?mmfsadm dump nsd? output, but my preliminary looks at it seems to indicate that I see I/O?s queueing in the 50 - 100 range for the small queues and the 60 - 200 range on the large queues. In addition, I am regularly seeing all 12 threads on the LARGE queues active, while it is much more rare that I see all - or even close to all - the threads on the SMALL queues active. As far as the parameters Scott and Yuri mention, on our cluster they are set thusly: [common] nsdMaxWorkerThreads 640 [] nsdMaxWorkerThreads 1024 [common] nsdThreadsPerQueue 4 [] nsdThreadsPerQueue 12 [common] nsdSmallThreadRatio 3 [] nsdSmallThreadRatio 1 So to me it sounds like I need more resources on the LARGE queue side of things ? i.e. it sure doesn?t sound like I want to change my small thread ratio. If I increase the amount of threads it sounds like that might help, but that also takes more pagepool, and I?ve got limited RAM in these (old) NSD servers. I do have nsdbufspace set to 70, but I?ve only got 16-24 GB RAM each in these NSD servers. And a while back I did try increase the page pool on them (very slightly) and ended up causing problems because then they ran out of physical RAM. Thoughts? Followup questions? Thanks! Kevin On Jul 3, 2018, at 3:11 PM, Frederick Stock > wrote: Are you seeing similar values for all the nodes or just some of them? One possible issue is how the NSD queues are configured on the NSD servers. You can see this with the output of "mmfsadm dump nsd". There are queues for LARGE IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the highest pending values to see if many IOs are queueing. There are a couple of options to fix this but rather than explain them I suggest you look for information about NSD queueing on the developerWorks site. There has been information posted there that should prove helpful. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 07/03/2018 03:49 PM Subject: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938076014&sdata=wIyB66HoqvL13I3LX0Ott%2Btr7HQQdInZ028QUp0QMhE%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C7658e1b458b147ad8a3908d5e12f6982%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662516110933587&sdata=RKuWKLRGoBRMSDHkrMsKsuU6JkiFgruK4e7gGafxAGc%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From scrusan at ddn.com Tue Jul 3 23:01:48 2018 From: scrusan at ddn.com (Steve Crusan) Date: Tue, 3 Jul 2018 22:01:48 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: Kevin, While this is happening, are you able to grab latency stats per LUN (hardware vendor agnostic) to see if there are any outliers? Also, when looking at the mmdiag output, are both reads and writes affected? Depending on the storage hardware, your writes might be hitting cache, so maybe this problem is being exasperated by many small reads (that are too random to be coalesced, take advantage of drive NCQ, etc). The other response about the nsd threads is also a good start, but if the I/O waits shift between different NSD servers and across hardware vendors, my assumption would be that you are hitting a bottleneck somewhere, but what you are seeing is symptoms of I/O backlog, which can manifest at any number of places. This could be something as low level as a few slow drives. Have you just started noticing this behavior? Any new applications on your system? Going by your institution, you're probably supposing a wide variety of codes, so if these problems just started happening, its possible that someone changed their code, or decided to run new scientific packages. -Steve ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: Tuesday, July 03, 2018 11:43 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] High I/O wait times Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 From taylorm at us.ibm.com Tue Jul 3 23:25:55 2018 From: taylorm at us.ibm.com (Michael L Taylor) Date: Tue, 3 Jul 2018 15:25:55 -0700 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 78, Issue 6 In-Reply-To: References: Message-ID: Hi Giuseppe, The GUI happens to document some of the zimon metrics in the KC here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1hlp_monperfmetrics.htm Hopefully that gets you a bit more of what you need but does not cover everything. Today's Topics: 1. Zimon metrics details (Lo Re Giuseppe) 2. Re: Zimon metrics details (Kristy Kallback-Rose) 3. Re: Zimon metrics details (Eric Agar) From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 07/02/2018 10:06 AM Subject: Re: [gpfsug-discuss] Zimon metrics details Sent by: gpfsug-discuss-bounces at spectrumscale.org +1 Would love to see more detailed descriptions on Zimon metrics. Sent from my iPhone On Jul 2, 2018, at 6:50 AM, Lo Re Giuseppe wrote: Hi everybody, I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? The SS probelm determination guide doens?t spend more than half a line for each. In particular I would like to understand the difference between these ones: - gpfs_fs_bytes_read - gpfs_fis_bytes_read The second gives tipically higher values than the first one. Thanks for any hit. Regards, Giuseppe -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Wed Jul 4 06:47:28 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 4 Jul 2018 05:47:28 +0000 Subject: [gpfsug-discuss] Filesystem Operation error Message-ID: <254f2811c2b14c9d8c82403d393d0178@SMXRF105.msg.hukrf.de> Hallo All, follow a short story from yesterday on Version 5.0.1.1. We had a 3 - Node cluster (2 Nodes for IO and the third for a quorum Buster function). A Admin make a mistake an take a delete of the 3 Node (VM). We restored ist with a VM Snapshot no Problem. The only point here we lost complete 7 desconly disk. We defined new one and want to delete this disk with mmdeldisk. On 6 Filesystems no problem but one has now a Problem. We delete this disk finaly with mmdeldisk fsname -p. And we see now after a successfully mmdelnsd the old disk already in following display. mmlsdisk tsmconf -L disk driver sector failure holds holds storage name type size group metadata data status availability disk id pool remarks ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- nsd_tsmconf001_DSK20 nsd 512 0 Yes Yes ready up 1 system desc nsd_g4_tsmconf nsd 512 2 No No removing refs down 2 system nsd_tsmconf001_DSK70 nsd 512 1 Yes Yes ready up 3 system desc nsd_g4_tsmconf1 nsd 512 2 No No ready up 4 system desc After that all fs-cmd geneate a fs operation error here like this. Error=MMFS_SYSTEM_UNMOUNT, ID=0xC954F85D, Tag=3882673: Unrecoverable file system operation error. Status code 65536. Volume tsmconf Questions: 1. What does this mean ?removing refs?. Now we don?t have the possibility to handle these disk. The disk itself is no more existend, but in the stripegroup a referenz is available. nsd_g4_tsmconf: uid 0A885085:577BB637, status ReferencesBeingRemoved, availability Unavailable, created on node 10.136.80.133, Tue Jul 5 15:29:27 2016 type 'nsd', sector size 512, failureConfigVersion 424 quorum weight {0,0}, failure group: id 2, fg index 1 locality group: id 2, lg index 1 failureGroupStrP: (2), rackId 2, locationId 0, extLgId 0 nSectors 528384 (0:81000) (258 MB), inode0Sector 131072 alloc region: no of bits 0, seg num -1, offset 0, len 72 suballocator 0x18015B8A7A4 type 0 nBits 32 subSize 0 dataOffset 4 nRows 0 len/off: storage pool: 0 holds nothing sectors past efficient device boundary: 0 isFenced: 1 start Region No: -1 end Region No:-1 start AllocMap Record: -1 2. Are there any cmd to handle these? 3. Where can I find the Status code 65536? A PMR is also open. Any Hints? Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tees at us.ibm.com Wed Jul 4 03:43:28 2018 From: tees at us.ibm.com (Stephen M Tee) Date: Tue, 3 Jul 2018 21:43:28 -0500 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: You dont state whether your running GPFS or ESS and which level. One thing you can check, is whether the SES and enclosure drivers are being loaded. The lsmod command will show if they are. These drivers were found to cause SCSI IO hangs in Linux RH7.3 and 7.4. If they are being loaded, you can blacklist and unload them with no impact to ESS/GNR By default these drivers are blacklisted in ESS. Stephen Tee ESS Storage Development IBM Systems and Technology Austin, TX 512-963-7177 From: Steve Crusan To: gpfsug main discussion list Date: 07/03/2018 05:08 PM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org Kevin, While this is happening, are you able to grab latency stats per LUN (hardware vendor agnostic) to see if there are any outliers? Also, when looking at the mmdiag output, are both reads and writes affected? Depending on the storage hardware, your writes might be hitting cache, so maybe this problem is being exasperated by many small reads (that are too random to be coalesced, take advantage of drive NCQ, etc). The other response about the nsd threads is also a good start, but if the I/O waits shift between different NSD servers and across hardware vendors, my assumption would be that you are hitting a bottleneck somewhere, but what you are seeing is symptoms of I/O backlog, which can manifest at any number of places. This could be something as low level as a few slow drives. Have you just started noticing this behavior? Any new applications on your system? Going by your institution, you're probably supposing a wide variety of codes, so if these problems just started happening, its possible that someone changed their code, or decided to run new scientific packages. -Steve ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: Tuesday, July 03, 2018 11:43 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] High I/O wait times Hi all, not We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Wed Jul 4 13:34:43 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 4 Jul 2018 08:34:43 -0400 (EDT) Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: Hi Kevin, Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > Hi all, > We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. ?One of the > confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from > NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. > > In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. ?In our environment, the most common cause has > been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. ?But that?s *not* happening this time. > Is there anything within GPFS / outside of a hardware issue that I should be looking for?? ?Thanks! > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu?- (615)875-9633 > > > > > From Renar.Grunenberg at huk-coburg.de Thu Jul 5 08:02:36 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 5 Jul 2018 07:02:36 +0000 Subject: [gpfsug-discuss] Filesystem Operation error In-Reply-To: <037a7d7f52bf4a6a83406c8c26fa4d82@SMXRF105.msg.hukrf.de> References: <037a7d7f52bf4a6a83406c8c26fa4d82@SMXRF105.msg.hukrf.de> Message-ID: <8fb424ee10404400ac6b81d985dd5bf9@SMXRF105.msg.hukrf.de> Hallo All, we fixed our Problem here with Spectrum Scale Support. The fixing cmd were ?mmcommon recoverfs tsmconf? and ?tsdeldisk tsmconf -d "nsd_g4_tsmconf". The final reason for this problem, if I want to delete a disk in a filesystem all disk must be reachable from the requesting host. In our config the NSD-Server had no NSD-Server Definitions and the Quorum Buster Node had no access to the SAN attached disk. A Recommendation from my site here are: This should be documented for a high available config with a 3 side implementation, or the cmds that want to update the nsd-descriptors for each disk should check are any disk reachable and don?t do a SG-Panic. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Mittwoch, 4. Juli 2018 07:47 An: 'gpfsug-discuss at spectrumscale.org' Betreff: Filesystem Operation error Hallo All, follow a short story from yesterday on Version 5.0.1.1. We had a 3 - Node cluster (2 Nodes for IO and the third for a quorum Buster function). A Admin make a mistake an take a delete of the 3 Node (VM). We restored ist with a VM Snapshot no Problem. The only point here we lost complete 7 desconly disk. We defined new one and want to delete this disk with mmdeldisk. On 6 Filesystems no problem but one has now a Problem. We delete this disk finaly with mmdeldisk fsname -p. And we see now after a successfully mmdelnsd the old disk already in following display. mmlsdisk tsmconf -L disk driver sector failure holds holds storage name type size group metadata data status availability disk id pool remarks ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- nsd_tsmconf001_DSK20 nsd 512 0 Yes Yes ready up 1 system desc nsd_g4_tsmconf nsd 512 2 No No removing refs down 2 system nsd_tsmconf001_DSK70 nsd 512 1 Yes Yes ready up 3 system desc nsd_g4_tsmconf1 nsd 512 2 No No ready up 4 system desc After that all fs-cmd geneate a fs operation error here like this. Error=MMFS_SYSTEM_UNMOUNT, ID=0xC954F85D, Tag=3882673: Unrecoverable file system operation error. Status code 65536. Volume tsmconf Questions: 1. What does this mean ?removing refs?. Now we don?t have the possibility to handle these disk. The disk itself is no more existend, but in the stripegroup a referenz is available. nsd_g4_tsmconf: uid 0A885085:577BB637, status ReferencesBeingRemoved, availability Unavailable, created on node 10.136.80.133, Tue Jul 5 15:29:27 2016 type 'nsd', sector size 512, failureConfigVersion 424 quorum weight {0,0}, failure group: id 2, fg index 1 locality group: id 2, lg index 1 failureGroupStrP: (2), rackId 2, locationId 0, extLgId 0 nSectors 528384 (0:81000) (258 MB), inode0Sector 131072 alloc region: no of bits 0, seg num -1, offset 0, len 72 suballocator 0x18015B8A7A4 type 0 nBits 32 subSize 0 dataOffset 4 nRows 0 len/off: storage pool: 0 holds nothing sectors past efficient device boundary: 0 isFenced: 1 start Region No: -1 end Region No:-1 start AllocMap Record: -1 2. Are there any cmd to handle these? 3. Where can I find the Status code 65536? A PMR is also open. Any Hints? Regards Renar -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Thu Jul 5 09:28:51 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Thu, 5 Jul 2018 08:28:51 +0000 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events In-Reply-To: References: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> , Message-ID: <83A6EEB0EC738F459A39439733AE804526729376@MBX114.d.ethz.ch> Hello Daniel, I've solved my problem disabling the check (I've gpfs v4.2.3-5) by putting ib_rdma_enable_monitoring=False in the [network] section of the file /var/mmfs/mmsysmon/mmsysmonitor.conf, and restarting the mmsysmonitor. There was a thread in this group about this problem. A ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Yaron Daniel [YARD at il.ibm.com] Sent: Sunday, July 01, 2018 7:17 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Hi There is was issue with Scale 5.x GUI error - ib_rdma_nic_unrecognized(mlx5_0/2) Check if you have the patch: [root at gssio1 ~]# diff /usr/lpp/mmfs/lib/mmsysmon/NetworkService.py /tmp/NetworkService.py 229c229,230 < recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) --- > #recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) > recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+/\d+\n", mmfsadm)) And restart the - mmsysmoncontrol restart Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_0B5B5F080B5B5954005EFD8BC22582BD] Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_06EDAF6406EDA744005EFD8BC22582BD][cid:_1_06EDB16C06EDA744005EFD8BC22582BD] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: "Andrew Beattie" To: gpfsug-discuss at spectrumscale.org Date: 06/28/2018 11:16 AM Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Do you know if there is actually a cable plugged into port 2? The system will work fine as long as there is network connectivity, but you may have an issue with redundancy or loss of bandwidth if you do not have every port cabled and configured correctly. Regards Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Date: Thu, Jun 28, 2018 6:08 PM Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root at sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root at sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root at sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root at sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root at sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.gif Type: image/gif Size: 1851 bytes Desc: ATT00001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00003.gif Type: image/gif Size: 4376 bytes Desc: ATT00003.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00004.gif Type: image/gif Size: 5093 bytes Desc: ATT00004.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00005.gif Type: image/gif Size: 4746 bytes Desc: ATT00005.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00006.gif Type: image/gif Size: 4557 bytes Desc: ATT00006.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00007.gif Type: image/gif Size: 5093 bytes Desc: ATT00007.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00008.jpg Type: image/jpeg Size: 11294 bytes Desc: ATT00008.jpg URL: From michael.holliday at crick.ac.uk Wed Jul 4 12:37:52 2018 From: michael.holliday at crick.ac.uk (Michael Holliday) Date: Wed, 4 Jul 2018 11:37:52 +0000 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: Hi All, Those commands show no errors not do any of the log files. GPFS has started correctly and showing the cluster and all nodes as up and active. We appear to have found the command that is hanging during the mount - However I'm not sure why its hanging. mmwmi mountedfilesystems Michael From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Yaron Daniel Sent: 20 June 2018 16:36 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Windows Mount Also what does mmdiag --network + mmgetstate -a show ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:image001.gif at 01D41393.D1DEB220] Storage Architect - IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:image004.gif at 01D41393.D1DEB220][cid:image005.gif at 01D41393.D1DEB220] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: "Yaron Daniel" > To: gpfsug main discussion list > Date: 06/20/2018 06:31 PM Subject: Re: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ HI Which Windows OS level - which GPFS FS level , what cygwin version ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:image001.gif at 01D41393.D1DEB220] Storage Architect - IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:image004.gif at 01D41393.D1DEB220][cid:image005.gif at 01D41393.D1DEB220][https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Michael Holliday > To: "gpfsug-discuss at spectrumscale.org" > Date: 06/20/2018 05:49 PM Subject: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, We've being trying to get the windows system to mount GPFS. We've set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing - GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 1851 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.gif Type: image/gif Size: 4376 bytes Desc: image002.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.gif Type: image/gif Size: 5093 bytes Desc: image003.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.gif Type: image/gif Size: 4746 bytes Desc: image004.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.gif Type: image/gif Size: 4557 bytes Desc: image005.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.gif Type: image/gif Size: 5093 bytes Desc: image006.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.jpg Type: image/jpeg Size: 11294 bytes Desc: image007.jpg URL: From heiner.billich at psi.ch Thu Jul 5 17:00:08 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 5 Jul 2018 16:00:08 +0000 Subject: [gpfsug-discuss] -o syncnfs has no effect? Message-ID: Hello, I try to mount a fs with "-o syncnfs" as we'll export it with CES/Protocols. But I never see the mount option displayed when I do # mount | grep fs-name This is a remote cluster mount, we'll run the Protocol nodes in a separate cluster. On the home cluster I see the option 'nfssync' in the output of 'mount'. My conclusion is that the mount option "syncnfs" has no effect on remote cluster mounts. Which seems a bit strange? Please can someone clarify on this? What is the impact on protocol nodes exporting remote cluster mounts? Is there any chance of data corruption? Or are some mount options implicitely inherited from the home cluster? I've read 'syncnfs' is default on Linux, but I would like to know for sure. Funny enough I can pass arbitrary options with # mmmount -o some-garbage which are silently ignored. I did 'mmchfs -o syncnfs' on the home cluster and the syncnfs option is present in /etc/fstab on the remote cluster. I did not remount on all nodes __ Thank you, I'll appreciate any hints or replies. Heiner Versions: Remote cluster 5.0.1 on RHEL7.4 (imounts the fs and runs protocol nodes) Home cluster 4.2.3-8 on RHEL6 (export the fs, owns the storage) Filesystem: 17.00 (4.2.3.0) All Linux x86_64 with Spectrum Scale Standard Edition -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From emanners at fsu.edu Thu Jul 5 19:53:36 2018 From: emanners at fsu.edu (Edson Manners) Date: Thu, 5 Jul 2018 14:53:36 -0400 Subject: [gpfsug-discuss] GPFS GUI Message-ID: <756966bc-5287-abf7-6531-4b249b0687e5@fsu.edu> There was another thread on here about the following error in the GUI: Event name: gui_cluster_down Cause: The GUI calculated that an insufficient amount of quorum nodes is up and running. But it looks like the resolution happened in another channel. I have the exact same problem even though we're running a production GPFS cluster that seems to work perfectly fine. This is the last error in the GUI that I'm trying to get solved. What would be the best way to try to troubleshoot this. -- [Any errors in spelling, tact or fact are transmission errors] - (Stolen from) Dag Wieers Edson Manners Research Computing Center FSU Information Technology Services Dirac Science Library., Room 150G Tallahassee, Florida 32306-4120 From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 6 02:11:17 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 6 Jul 2018 01:11:17 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue. While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week. You?re correct about our mixed workload. There have been no new workloads that I am aware of. Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the servers and storage. We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks. Linux multipathing handles path failures. 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time). So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array. As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output. We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity. Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized. And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB. That has not made a difference. How can I determine how much of the pagepool is actually being used, BTW? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns. The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way. The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now. If you have read this entire very long e-mail, first off, thank you! If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the >> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from >> NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has >> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 From andreas.koeninger at de.ibm.com Fri Jul 6 07:38:07 2018 From: andreas.koeninger at de.ibm.com (Andreas Koeninger) Date: Fri, 6 Jul 2018 06:38:07 +0000 Subject: [gpfsug-discuss] GPFS GUI In-Reply-To: <756966bc-5287-abf7-6531-4b249b0687e5@fsu.edu> References: <756966bc-5287-abf7-6531-4b249b0687e5@fsu.edu> Message-ID: An HTML attachment was scrubbed... URL: From jjdoherty at yahoo.com Fri Jul 6 14:02:38 2018 From: jjdoherty at yahoo.com (Jim Doherty) Date: Fri, 6 Jul 2018 13:02:38 +0000 (UTC) Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> Message-ID: <733478365.61492.1530882158667@mail.yahoo.com> You may want to get an mmtrace,? but I suspect that the disk IOs are slow.???? The iohist is showing the time from when the start IO was issued until it was finished.??? Of course if you have disk IOs taking 10x too long then other IOs are going to queue up behind it.??? If there are more IOs than there are NSD server threads then there are going to be IOs that are queued and waiting for a thread. Jim On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L wrote: Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue.? While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week.? You?re correct about our mixed workload.? There have been no new workloads that I am aware of. Stephen - no, this is not an ESS.? We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN.? Commodity hardware for the servers and storage.? We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks.? Linux multipathing handles path failures.? 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time).? So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array.? As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output.? We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. ??? 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. ??? 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. ??? 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. ??? 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. ??? 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. ??? 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. ??? 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. ??? 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. ??? 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. ??? 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. ??? 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. ??? 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. ??? 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. ??? 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. ??? 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. ??? 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity.? Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized.? And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB.? That has not made a difference.? How can I determine how much of the pagepool is actually being used, BTW?? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns.? The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way.? The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now.? If you have read this entire very long e-mail, first off, thank you!? If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why.? One of the >> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from >> NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related.? In our environment, the most common cause has >> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk.? But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for??? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From emanners at fsu.edu Fri Jul 6 14:05:32 2018 From: emanners at fsu.edu (Edson Manners) Date: Fri, 6 Jul 2018 13:05:32 +0000 Subject: [gpfsug-discuss] GPFS GUI Message-ID: Ok. I'm on 4.2.3-5. So would this bug still show up if my remote filesystem is mounted? Because it is. Thanks. On 7/6/2018 2:38:21 AM, Andreas Koeninger wrote: Which version are you using? There was a bug in 4.2.3.6 and before related to unmounted remote filesystems which could lead to a gui_cluster_down event on the local cluster. Mit freundlichen Gr??en / Kind regards Andreas Koeninger Scrum Master and Software Developer / Spectrum Scale GUI and REST API IBM Systems &Technology Group, Integrated Systems Development / M069 ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49-7034-643-0867 Mobile: +49-7034-643-0867 E-Mail: andreas.koeninger at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Original message ----- From: Edson Manners Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [gpfsug-discuss] GPFS GUI Date: Thu, Jul 5, 2018 11:38 PM There was another thread on here about the following error in the GUI: Event name: gui_cluster_down Cause: The GUI calculated that an insufficient amount of quorum nodes is up and running. But it looks like the resolution happened in another channel. I have the exact same problem even though we're running a production GPFS cluster that seems to work perfectly fine. This is the last error in the GUI that I'm trying to get solved. What would be the best way to try to troubleshoot this. -- [Any errors in spelling, tact or fact are transmission errors] - (Stolen from) Dag Wieers Edson Manners Research Computing Center FSU Information Technology Services Dirac Science Library., Room 150G Tallahassee, Florida 32306-4120 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.koeninger at de.ibm.com Fri Jul 6 14:31:32 2018 From: andreas.koeninger at de.ibm.com (Andreas Koeninger) Date: Fri, 6 Jul 2018 13:31:32 +0000 Subject: [gpfsug-discuss] GPFS GUI In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 6 15:27:51 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 6 Jul 2018 14:27:51 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <733478365.61492.1530882158667@mail.yahoo.com> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> Message-ID: Hi Jim, Thank you for your response. We are taking a two-pronged approach at this point: 1. While I don?t see anything wrong with our storage arrays, I have opened a ticket with the vendor (not IBM) to get them to look at things from that angle. 2. Since the problem moves around from time to time, we are enhancing our monitoring script to see if we can basically go from ?mmdiag ?iohist? to ?clients issuing those I/O requests? to ?jobs running on those clients? to see if there is any commonality there. Thanks again - much appreciated! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 6, 2018, at 8:02 AM, Jim Doherty > wrote: You may want to get an mmtrace, but I suspect that the disk IOs are slow. The iohist is showing the time from when the start IO was issued until it was finished. Of course if you have disk IOs taking 10x too long then other IOs are going to queue up behind it. If there are more IOs than there are NSD server threads then there are going to be IOs that are queued and waiting for a thread. Jim On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L > wrote: Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue. While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week. You?re correct about our mixed workload. There have been no new workloads that I am aware of. Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the servers and storage. We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks. Linux multipathing handles path failures. 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time). So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array. As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output. We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity. Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized. And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB. That has not made a difference. How can I determine how much of the pagepool is actually being used, BTW? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns. The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way. The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now. If you have read this entire very long e-mail, first off, thank you! If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister > wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the >> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from >> NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has >> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C331014fd459d4151432308d5e340c4fa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664789687076842&sdata=UhjNipQdsNjxIcUB%2Ffu2qEwn7K6tIBmGWEIruxGgI4A%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Fri Jul 6 18:13:26 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Fri, 6 Jul 2018 10:13:26 -0700 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> Message-ID: Hi Kevin, This is a bit of a "cargo cult" suggestion but one issue that I have seen is if a disk starts misbehaving a bit but does not fail, it slows down the whole raid group that it is in. And the only way to detect it is to examine the read/write latencies on the individual disks. Does your SAN allow you to do that? That happened to me at least twice in my life and replacing the offending individual disk solved the issue. This was on DDN, so the relevant command were something like 'show pd * counters write_lat' or similar, which showed the latency for the I/Os for each disk. If one disk in the group is an outlier (e.g. 1s write latencies), then the whole raid array (LUN) is just waiting for that one disk. Another possibility for troubleshooting, if you have sufficient free resources: you can just suspend the problematic LUNs in GPFS, as that will remove the write load from them, while still having them service read requests and not affecting users. Regards, Alex On Fri, Jul 6, 2018 at 9:11 AM Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu> wrote: > Hi Jim, > > Thank you for your response. We are taking a two-pronged approach at this > point: > > 1. While I don?t see anything wrong with our storage arrays, I have > opened a ticket with the vendor (not IBM) to get them to look at things > from that angle. > > 2. Since the problem moves around from time to time, we are enhancing our > monitoring script to see if we can basically go from ?mmdiag ?iohist? to > ?clients issuing those I/O requests? to ?jobs running on those clients? to > see if there is any commonality there. > > Thanks again - much appreciated! > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > On Jul 6, 2018, at 8:02 AM, Jim Doherty wrote: > > You may want to get an mmtrace, but I suspect that the disk IOs are > slow. The iohist is showing the time from when the start IO was issued > until it was finished. Of course if you have disk IOs taking 10x too > long then other IOs are going to queue up behind it. If there are more > IOs than there are NSD server threads then there are going to be IOs that > are queued and waiting for a thread. > > Jim > > > On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L < > Kevin.Buterbaugh at Vanderbilt.Edu> wrote: > > > Hi All, > > First off, my apologies for the delay in responding back to the list ? > we?ve actually been working our tails off on this one trying to collect as > much data as we can on what is a very weird issue. While I?m responding to > Aaron?s e-mail, I?m going to try to address the questions raised in all the > responses. > > Steve - this all started last week. You?re correct about our mixed > workload. There have been no new workloads that I am aware of. > > Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. > > Aaron - no, this is not on a DDN, either. > > The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the > servers and storage. We have two SAN ?stacks? and all NSD servers and > storage are connected to both stacks. Linux multipathing handles path > failures. 10 GbE out to the network. > > We first were alerted to this problem by one of our monitoring scripts > which was designed to alert us to abnormally high I/O times, which, as I > mentioned previously, in our environment has usually been caused by cache > battery backup failures in the storage array controllers (but _not_ this > time). So I?m getting e-mails that in part read: > > Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. > Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. > > The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN > on that storage array. As I?ve mentioned, those two LUNs are by far and > away my most frequent problem children, but here?s another report from > today as well: > > Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. > Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. > Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. > Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. > > NSD server hostnames have been changed, BTW, from their real names to nsd1 > - 8. > > Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm > dump nsd? output. We wrote a Python script to pull out what we think is > the most pertinent information: > > nsd1 > 29 SMALL queues, 50 requests pending, 3741 was the highest number of > requests pending. > 348 threads started, 1 threads active, 348 was the highest number of > threads active. > 29 LARGE queues, 0 requests pending, 5694 was the highest number of > requests pending. > 348 threads started, 124 threads active, 348 was the highest number of > threads active. > nsd2 > 29 SMALL queues, 0 requests pending, 1246 was the highest number of > requests pending. > 348 threads started, 13 threads active, 348 was the highest number of > threads active. > 29 LARGE queues, 470 requests pending, 2404 was the highest number of > requests pending. > 348 threads started, 340 threads active, 348 was the highest number of > threads active. > nsd3 > 29 SMALL queues, 108 requests pending, 1796 was the highest number of > requests pending. > 348 threads started, 0 threads active, 348 was the highest number of > threads active. > 29 LARGE queues, 35 requests pending, 3331 was the highest number of > requests pending. > 348 threads started, 4 threads active, 348 was the highest number of > threads active. > nsd4 > 42 SMALL queues, 0 requests pending, 1529 was the highest number of > requests pending. > 504 threads started, 8 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 0 requests pending, 637 was the highest number of > requests pending. > 504 threads started, 211 threads active, 504 was the highest number of > threads active. > nsd5 > 42 SMALL queues, 182 requests pending, 2798 was the highest number of > requests pending. > 504 threads started, 6 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 407 requests pending, 4416 was the highest number of > requests pending. > 504 threads started, 8 threads active, 504 was the highest number of > threads active. > nsd6 > 42 SMALL queues, 0 requests pending, 1630 was the highest number of > requests pending. > 504 threads started, 0 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 0 requests pending, 148 was the highest number of > requests pending. > 504 threads started, 9 threads active, 504 was the highest number of > threads active. > nsd7 > 42 SMALL queues, 43 requests pending, 2179 was the highest number of > requests pending. > 504 threads started, 1 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 0 requests pending, 2551 was the highest number of > requests pending. > 504 threads started, 13 threads active, 504 was the highest number of > threads active. > nsd8 > 42 SMALL queues, 0 requests pending, 1014 was the highest number of > requests pending. > 504 threads started, 4 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 0 requests pending, 3371 was the highest number of > requests pending. > 504 threads started, 89 threads active, 504 was the highest number of > threads active. > > Note that we see more ?load? on the LARGE queue side of things and that > nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most > frequently in our alerts) are the heaviest loaded. > > One other thing we have noted is that our home grown RRDtool monitoring > plots that are based on netstat, iostat, vmstat, etc. also show an oddity. > Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 > (there are 4 in total) show up as 93 - 97% utilized. And another oddity > there is that eon34A and eon34B rarely show up on the alert e-mails, while > eon34C and eon34E show up waaaayyyyyyy more than anything else ? the > difference between them is that A and B are on the storage array itself and > C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve > actually checked and reseated those connections). > > Another reason why I could not respond earlier today is that one of the > things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 > from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool > on those two boxes to 40 GB. That has not made a difference. How can I > determine how much of the pagepool is actually being used, BTW? A quick > Google search didn?t help me. > > So we?re trying to figure out if we have storage hardware issues causing > GPFS issues or GPFS issues causing storage slowdowns. The fact that I see > slowdowns most often on one storage array points in one direction, while > the fact that at times I see even worse slowdowns on multiple other arrays > points the other way. The fact that some NSD servers show better stats > than others in the analysis of the ?mmfsadm dump nsd? output tells me ? > well, I don?t know what it tells me. > > I think that?s all for now. If you have read this entire very long > e-mail, first off, thank you! If you?ve read it and have ideas for where I > should go from here, T-H-A-N-K Y-O-U! > > Kevin > > > On Jul 4, 2018, at 7:34 AM, Aaron Knister > wrote: > > > > Hi Kevin, > > > > Just going out on a very weird limb here...but you're not by chance > seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. > SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high > latency on some of our SFA12ks (that have otherwise been solid both in > terms of stability and performance) but only on certain volumes and the > affected volumes change. It's very bizzarre and we've been working closely > with DDN to track down the root cause but we've not yet found a smoking > gun. The timing and description of your problem sounded eerily similar to > what we're seeing so I'd thought I'd ask. > > > > -Aaron > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > > > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > > > >> Hi all, > >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some > of our NSDs as reported by ?mmdiag ?iohist" and are struggling to > understand why. One of the > >> confusing things is that, while certain NSDs tend to show the problem > more than others, the problem is not consistent ? i.e. the problem tends to > move around from > >> NSD to NSD (and storage array to storage array) whenever we check ? > which is sometimes just a few minutes apart. > >> In the past when I have seen ?mmdiag ?iohist? report high wait times > like this it has *always* been hardware related. In our environment, the > most common cause has > >> been a battery backup unit on a storage array controller going bad and > the storage array switching to write straight to disk. But that?s *not* > happening this time. > >> Is there anything within GPFS / outside of a hardware issue that I > should be looking for?? Thanks! > >> ? > >> Kevin Buterbaugh - Senior System Administrator > >> Vanderbilt University - Advanced Computing Center for Research and > Education > >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C331014fd459d4151432308d5e340c4fa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664789687076842&sdata=UhjNipQdsNjxIcUB%2Ffu2qEwn7K6tIBmGWEIruxGgI4A%3D&reserved=0 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Jul 6 22:03:09 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 6 Jul 2018 22:03:09 +0100 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> Message-ID: <865c5f52-fa62-571f-aeef-9b1073dfa156@strath.ac.uk> On 06/07/18 02:11, Buterbaugh, Kevin L wrote: [SNIP] > > The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for > the servers and storage. We have two SAN ?stacks? and all NSD > servers and storage are connected to both stacks. Linux multipathing > handles path failures. 10 GbE out to the network. You don't mention it, but have you investigated your FC fabric? Dodgy laser, bad photodiode or damaged fibre can cause havoc. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Kevin.Buterbaugh at Vanderbilt.Edu Sat Jul 7 01:28:06 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Sat, 7 Jul 2018 00:28:06 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> Message-ID: <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> Hi All, Another update on this issue as we have made significant progress today ? but first let me address the two responses I received. Alex - this is a good idea and yes, we did this today. We did see some higher latencies on one storage array as compared to the others. 10-20 ms on the ?good? storage arrays ? 50-60 ms on the one storage array. It took us a while to be able to do this because while the vendor provides a web management interface, that didn?t show this information. But they have an actual app that will ? and the Mac and Linux versions don?t work. So we had to go scrounge up this thing called a Windows PC and get the software installed there. ;-) Jonathan - also a good idea and yes, we also did this today. I?ll explain as part of the rest of this update. The main thing that we did today that has turned out to be most revealing is to take a list of all the NSDs in the impacted storage pool ? 19 devices spread out over 7 storage arrays ? and run read dd tests on all of them (the /dev/dm-XX multipath device). 15 of them showed rates of 33 - 100+ MB/sec and the variation is almost definitely explained by the fact that they?re in production use and getting hit by varying amounts of ?real? work. But 4 of them showed rates of 2-10 MB/sec and those 4 all happen to be on storage array eon34. So, to try to rule out everything but the storage array we replaced the FC cables going from the SAN switches to the array, plugging the new cables into different ports on the SAN switches. Then we repeated the dd tests from a different NSD server, which both eliminated the NSD server and its? FC cables as a potential cause ? and saw results virtually identical to the previous test. Therefore, we feel pretty confident that it is the storage array and have let the vendor know all of this. And there?s another piece of quite possibly relevant info ? the last week in May one of the controllers in this array crashed and rebooted (it?s a active-active dual controller array) ? when that happened the failover occurred ? with a major glitch. One of the LUNs essentially disappeared ? more accurately, it was there, but had no size! We?ve been using this particular vendor for 15 years now and I have seen more than a couple of their controllers go bad during that time and nothing like this had ever happened before. They were never able to adequately explain what happened there. So what I am personally suspecting has happened is that whatever caused that one LUN to go MIA has caused these issues with the other LUNs on the array. As an aside, we ended up using mmfileid to identify the files that had blocks on the MIA LUN and restored those from tape backup. I want to thank everyone who has offered their suggestions so far. I will update the list again once we have a definitive problem determination. I hope that everyone has a great weekend. In the immortal words of the wisest man who ever lived, ?I?m kinda tired ? think I?ll go home now.? ;-) Kevin On Jul 6, 2018, at 12:13 PM, Alex Chekholko > wrote: Hi Kevin, This is a bit of a "cargo cult" suggestion but one issue that I have seen is if a disk starts misbehaving a bit but does not fail, it slows down the whole raid group that it is in. And the only way to detect it is to examine the read/write latencies on the individual disks. Does your SAN allow you to do that? That happened to me at least twice in my life and replacing the offending individual disk solved the issue. This was on DDN, so the relevant command were something like 'show pd * counters write_lat' or similar, which showed the latency for the I/Os for each disk. If one disk in the group is an outlier (e.g. 1s write latencies), then the whole raid array (LUN) is just waiting for that one disk. Another possibility for troubleshooting, if you have sufficient free resources: you can just suspend the problematic LUNs in GPFS, as that will remove the write load from them, while still having them service read requests and not affecting users. Regards, Alex On Fri, Jul 6, 2018 at 9:11 AM Buterbaugh, Kevin L > wrote: Hi Jim, Thank you for your response. We are taking a two-pronged approach at this point: 1. While I don?t see anything wrong with our storage arrays, I have opened a ticket with the vendor (not IBM) to get them to look at things from that angle. 2. Since the problem moves around from time to time, we are enhancing our monitoring script to see if we can basically go from ?mmdiag ?iohist? to ?clients issuing those I/O requests? to ?jobs running on those clients? to see if there is any commonality there. Thanks again - much appreciated! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 6, 2018, at 8:02 AM, Jim Doherty > wrote: You may want to get an mmtrace, but I suspect that the disk IOs are slow. The iohist is showing the time from when the start IO was issued until it was finished. Of course if you have disk IOs taking 10x too long then other IOs are going to queue up behind it. If there are more IOs than there are NSD server threads then there are going to be IOs that are queued and waiting for a thread. Jim On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L > wrote: Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue. While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week. You?re correct about our mixed workload. There have been no new workloads that I am aware of. Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the servers and storage. We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks. Linux multipathing handles path failures. 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time). So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array. As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output. We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity. Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized. And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB. That has not made a difference. How can I determine how much of the pagepool is actually being used, BTW? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns. The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way. The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now. If you have read this entire very long e-mail, first off, thank you! If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister > wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the >> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from >> NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has >> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C331014fd459d4151432308d5e340c4fa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664789687076842&sdata=UhjNipQdsNjxIcUB%2Ffu2qEwn7K6tIBmGWEIruxGgI4A%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Caa277914313f445d702e08d5e363d347%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664940252877301&sdata=bnjsWHwutbbKstghBrB5Y7%2FIzeX7U19vroW%2B0xA2gX8%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Jul 7 09:42:57 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 7 Jul 2018 09:42:57 +0100 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> Message-ID: <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> On 07/07/18 01:28, Buterbaugh, Kevin L wrote: [SNIP] > > So, to try to rule out everything but the storage array we replaced the > FC cables going from the SAN switches to the array, plugging the new > cables into different ports on the SAN switches. ?Then we repeated the > dd tests from a different NSD server, which both eliminated the NSD > server and its? FC cables as a potential cause ? and saw results > virtually identical to the previous test. ?Therefore, we feel pretty > confident that it is the storage array and have let the vendor know all > of this. I was not thinking of doing anything quite as drastic as replacing stuff, more look into the logs on the switches in the FC network and examine them for packet errors. The above testing didn't eliminate bad optics in the storage array itself for example, though it does appear to be the storage arrays themselves. Sounds like they could do with a power cycle... JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Cameron.Dunn at bristol.ac.uk Fri Jul 6 17:36:14 2018 From: Cameron.Dunn at bristol.ac.uk (Cameron Dunn) Date: Fri, 6 Jul 2018 16:36:14 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: , Message-ID: Thanks Christof, we had left out "gpfs" from the vfs objects = line in smb.conf so setting vfs objects = gpfs (etc) gpfs:hsm = yes gpfs:recalls = yes (not "no" as I had originally, and is implied by the manual) and setting the offline flag on the file by migrating it, so that # mmlsattr -L filename.jpg ... Misc attributes: ARCHIVE OFFLINE now Explorer on Windows 7 and 10 do not recall the file while viewing the folder with "Large icons" and a standard icon with an X is displayed. But after the file is then opened and recalled, the icon displays the thumbnail image and the OFFLINE flag is lost. Also as you observed, Finder on MacOSX 10.13 ignores the file's offline flag, so we still risk a recall storm caused by them. All the best, Cameron ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Christof Schmitt Sent: 03 July 2018 20:37:08 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms > HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape > are then shared by Samba to Macs and PCs. > MacOS Finder and Windows Explorer will want to display all the thumbnail images of a > folder's contents, which will recall lots of files from tape. SMB clients can query file information, including the OFFLINE flag. With Spectrum Scale and the "gpfs" module loaded in Samba that is mapped from the the OFFLINE flag that is visible in "mmlsattr -L". In those systems, the SMB client can determine that a file is offline. In our experience this is handled correctly in Windows Explorer; when an "offline" file is encountered, no preview is generated from the file data. The Finder on Mac clients does not seem to honor the OFFLINE flag, thus the main problems are typically recall storms caused by Mac clients. > According to the Samba documentation this is preventable by setting the following > ---------------------------------------------- > https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html > > gpfs:recalls = [ yes | no ] > When this option is set to no, an attempt to open an offline file > will be rejected with access denied. > This helps preventing recall storms triggered by careless applications like Finder and Explorer. > > yes(default) - Open files that are offline. This will recall the files from HSM. > no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. > Using this setting also requires gpfs:hsm to be set to yes. > > gpfs:hsm = [ yes | no ] > Enable/Disable announcing if this FS has HSM enabled. > no(default) - Do not announce HSM. > yes - Announce HSM. > -------------------------------------------------- > > However we could not get this to work. > > On Centos7/Samba4.5, smb.conf contained > gpfs:hsm = yes > gpfs:recalls = no > (also tried setting gpfs:offline = yes, though this is not documented) These options apply to the "gpfs" module in Samba. The Samba version you are using needs to be built with GPFS support and the "gpfs" module needs to be loaded through the "vfs objects" configuration. As Centos7/Samba4.5 is mentioned, would guess that the CentOS provided Samba version is used, which is probably not compiled with GPFS support. >From IBM we would recommend to use CES for protocol services, which also provides Samba for SMB. The Samba provided through CES is configured so that the gpfs:recalls option can be used: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmsmb.htm gpfs:recalls If the value is set as yes files that have been migrated from disk will be recalled on access. By default, this is enabled. If recalls = no files will not be recalled on access and the client will receive ACCESS_DENIED message. > We made a share containing image files that were then migrated to tape by LTFS-EE, > to see if these flags were respected by OS X Finder or Windows Explorer. > > Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, > so that when browsing the stubs in the share, the files were recalled from tape > and the thumbnails displayed. > > Has anyone seen these flags working as they are supposed to ? Yes, they are working, as we use them in our Samba build. Debugging this would require looking at the Samba configuration and possibly collecting a trace. If my above assumption was wrong and this problem occurs with the CES Samba (gpfs.smb), please open a PMR for debugging this issue. If this is not the CES Samba, please contact the provider of the Samba package for additional support. Regards, Christof Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: Cameron Dunn Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] preventing HSM tape recall storms Date: Tue, Jul 3, 2018 6:22 AM HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape are then shared by Samba to Macs and PCs. MacOS Finder and Windows Explorer will want to display all the thumbnail images of a folder's contents, which will recall lots of files from tape. According to the Samba documentation this is preventable by setting the following ---------------------------------------------- https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html gpfs:recalls = [ yes | no ] When this option is set to no, an attempt to open an offline file will be rejected with access denied. This helps preventing recall storms triggered by careless applications like Finder and Explorer. yes(default) - Open files that are offline. This will recall the files from HSM. no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. Using this setting also requires gpfs:hsm to be set to yes. gpfs:hsm = [ yes | no ] Enable/Disable announcing if this FS has HSM enabled. no(default) - Do not announce HSM. yes - Announce HSM. -------------------------------------------------- However we could not get this to work. On Centos7/Samba4.5, smb.conf contained gpfs:hsm = yes gpfs:recalls = no (also tried setting gpfs:offline = yes, though this is not documented) We made a share containing image files that were then migrated to tape by LTFS-EE, to see if these flags were respected by OS X Finder or Windows Explorer. Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, so that when browsing the stubs in the share, the files were recalled from tape and the thumbnails displayed. Has anyone seen these flags working as they are supposed to ? Many thanks for any ideas, Cameron Cameron Dunn Advanced Computing Systems Administrator Advanced Computing Research Centre University of Bristol _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From YARD at il.ibm.com Sun Jul 8 18:32:25 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Sun, 8 Jul 2018 20:32:25 +0300 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu><397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu><733478365.61492.1530882158667@mail.yahoo.com><1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> Message-ID: Hi Clean all counters on the FC switches and see which port have errors . For brocade run : slotstatsclear statsclear porterrshow For cisco run: clear countersall There might be bad gbic/cable/Storage gbic, which can affect the performance, if there is something like that - u can see which ports have errors grow over time. Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Jonathan Buzzard To: gpfsug-discuss at spectrumscale.org Date: 07/07/2018 11:43 AM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org On 07/07/18 01:28, Buterbaugh, Kevin L wrote: [SNIP] > > So, to try to rule out everything but the storage array we replaced the > FC cables going from the SAN switches to the array, plugging the new > cables into different ports on the SAN switches. Then we repeated the > dd tests from a different NSD server, which both eliminated the NSD > server and its? FC cables as a potential cause ? and saw results > virtually identical to the previous test. Therefore, we feel pretty > confident that it is the storage array and have let the vendor know all > of this. I was not thinking of doing anything quite as drastic as replacing stuff, more look into the logs on the switches in the FC network and examine them for packet errors. The above testing didn't eliminate bad optics in the storage array itself for example, though it does appear to be the storage arrays themselves. Sounds like they could do with a power cycle... JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=TM-kJsvzTX9cq_xmR5ITHclBCfO4FDvZ3ZxyugfJCfQ&s=Ass164qVEhb9fC4_VCmzfZeYd_BLOv9cZsfkrzqi8pM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From chris.schlipalius at pawsey.org.au Mon Jul 9 01:36:01 2018 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Mon, 09 Jul 2018 08:36:01 +0800 Subject: [gpfsug-discuss] Upcoming meeting: Australian Spectrum Scale Usergroup 10th August 2018 Sydney Message-ID: <2BD2D9AA-774D-4D6E-A2E6-069E7E91F40E@pawsey.org.au> Dear members, Please note the next Australian Usergroup is confirmed. If you plan to attend, please register: http://bit.ly/2NiNFEQ Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 13 Burvill Court Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10708 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Mon Jul 9 09:51:25 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 9 Jul 2018 08:51:25 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> Message-ID: Did you upgrade the memory etc purely as a "maybe this will help" fix? If so, and it didn't help, I'd be tempted to reduce it again as you may introduce another problem into the environment. I wonder if your disks are about to die, although I suspect you'd have already been forewarned of errors from the disk(s) via the storage system. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 06 July 2018 02:11 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] High I/O wait times Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue. While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week. You?re correct about our mixed workload. There have been no new workloads that I am aware of. Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the servers and storage. We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks. Linux multipathing handles path failures. 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time). So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array. As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output. We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity. Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized. And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB. That has not made a difference. How can I determine how much of the pagepool is actually being used, BTW? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns. The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way. The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now. If you have read this entire very long e-mail, first off, thank you! If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight > Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on >> some of our NSDs as reported by ?mmdiag ?iohist" and are struggling >> to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times >> like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator Vanderbilt University >> - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug > .org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterb > augh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3b > e4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D% > 2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From scale at us.ibm.com Mon Jul 9 17:57:18 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 9 Jul 2018 09:57:18 -0700 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: Hello, Can you provide the Windows OS and GPFS versions. Does the mmmount hang indefinitely or for a finite time (like 30 seconds or so)? Do you see any GPFS waiters during the mmmount hang? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michael Holliday To: gpfsug main discussion list Date: 07/05/2018 08:12 AM Subject: Re: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Those commands show no errors not do any of the log files. GPFS has started correctly and showing the cluster and all nodes as up and active. We appear to have found the command that is hanging during the mount - However I?m not sure why its hanging. mmwmi mountedfilesystems Michael From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Yaron Daniel Sent: 20 June 2018 16:36 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Windows Mount Also what does mmdiag --network + mmgetstate -a show ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Yaron Daniel" To: gpfsug main discussion list Date: 06/20/2018 06:31 PM Subject: Re: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org HI Which Windows OS level - which GPFS FS level , what cygwin version ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Michael Holliday To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 06/20/2018 05:49 PM Subject: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We?ve being trying to get the windows system to mount GPFS. We?ve set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing ? GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From christof.schmitt at us.ibm.com Mon Jul 9 19:53:36 2018 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Mon, 9 Jul 2018 18:53:36 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: , , Message-ID: An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Jul 9 19:57:38 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 9 Jul 2018 14:57:38 -0400 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: , , Message-ID: Another option is to request Apple to support the OFFLINE flag in the SMB protocol. The more Mac customers making such a request (I have asked others to do likewise) might convince Apple to add this checking to their SMB client. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Christof Schmitt" To: gpfsug-discuss at spectrumscale.org Date: 07/09/2018 02:53 PM Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms Sent by: gpfsug-discuss-bounces at spectrumscale.org > we had left out "gpfs" from the > vfs objects = > line in smb.conf > > so setting > vfs objects = gpfs (etc) > gpfs:hsm = yes > gpfs:recalls = yes (not "no" as I had originally, and is implied by the manual) Thank you for the update. gpfs:recalls=yes is the default, allowing recalls of files. If you set that to 'no', Samba will deny access to "OFFLINE" files in GPFS through SMB. > and setting the offline flag on the file by migrating it, so that > # mmlsattr -L filename.jpg > ... > Misc attributes: ARCHIVE OFFLINE > > now Explorer on Windows 7 and 10 do not recall the file while viewing the folder with "Large icons" > > and a standard icon with an X is displayed. > > But after the file is then opened and recalled, the icon displays the thumbnail image and the OFFLINE flag is lost. Yes, that is working as intended. While the file is only in the "external pool" (e.g. HSM tape), the OFFLINE flag is reported. Once you read/write data, that triggers a recall to the disk pool and the flag is cleared. > Also as you observed, Finder on MacOSX 10.13 ignores the file's offline flag, > > so we still risk a recall storm caused by them. The question here would be how to handle the Mac clients. You could configured two SMB shares on the same path: One with gpfs:recalls=yes and tell the Windows users to access that share; the other one with gpfs:recalls=no and tell the Mac users to use that share. That would avoid the recall storms, but runs the risk of Mac users connecting to the wrong share and avoiding this workaround... Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: Cameron Dunn Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms Date: Sat, Jul 7, 2018 2:30 PM Thanks Christof, we had left out "gpfs" from the vfs objects = line in smb.conf so setting vfs objects = gpfs (etc) gpfs:hsm = yes gpfs:recalls = yes (not "no" as I had originally, and is implied by the manual) and setting the offline flag on the file by migrating it, so that # mmlsattr -L filename.jpg ... Misc attributes: ARCHIVE OFFLINE now Explorer on Windows 7 and 10 do not recall the file while viewing the folder with "Large icons" and a standard icon with an X is displayed. But after the file is then opened and recalled, the icon displays the thumbnail image and the OFFLINE flag is lost. Also as you observed, Finder on MacOSX 10.13 ignores the file's offline flag, so we still risk a recall storm caused by them. All the best, Cameron From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Christof Schmitt Sent: 03 July 2018 20:37:08 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms > HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape > are then shared by Samba to Macs and PCs. > MacOS Finder and Windows Explorer will want to display all the thumbnail images of a > folder's contents, which will recall lots of files from tape. SMB clients can query file information, including the OFFLINE flag. With Spectrum Scale and the "gpfs" module loaded in Samba that is mapped from the the OFFLINE flag that is visible in "mmlsattr -L". In those systems, the SMB client can determine that a file is offline. In our experience this is handled correctly in Windows Explorer; when an "offline" file is encountered, no preview is generated from the file data. The Finder on Mac clients does not seem to honor the OFFLINE flag, thus the main problems are typically recall storms caused by Mac clients. > According to the Samba documentation this is preventable by setting the following > ---------------------------------------------- > https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html > > gpfs:recalls = [ yes | no ] > When this option is set to no, an attempt to open an offline file > will be rejected with access denied. > This helps preventing recall storms triggered by careless applications like Finder and Explorer. > > yes(default) - Open files that are offline. This will recall the files from HSM. > no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. > Using this setting also requires gpfs:hsm to be set to yes. > > gpfs:hsm = [ yes | no ] > Enable/Disable announcing if this FS has HSM enabled. > no(default) - Do not announce HSM. > yes - Announce HSM. > -------------------------------------------------- > > However we could not get this to work. > > On Centos7/Samba4.5, smb.conf contained > gpfs:hsm = yes > gpfs:recalls = no > (also tried setting gpfs:offline = yes, though this is not documented) These options apply to the "gpfs" module in Samba. The Samba version you are using needs to be built with GPFS support and the "gpfs" module needs to be loaded through the "vfs objects" configuration. As Centos7/Samba4.5 is mentioned, would guess that the CentOS provided Samba version is used, which is probably not compiled with GPFS support. >From IBM we would recommend to use CES for protocol services, which also provides Samba for SMB. The Samba provided through CES is configured so that the gpfs:recalls option can be used: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmsmb.htm gpfs:recalls If the value is set as yes files that have been migrated from disk will be recalled on access. By default, this is enabled. If recalls = no files will not be recalled on access and the client will receive ACCESS_DENIED message. > We made a share containing image files that were then migrated to tape by LTFS-EE, > to see if these flags were respected by OS X Finder or Windows Explorer. > > Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, > so that when browsing the stubs in the share, the files were recalled from tape > and the thumbnails displayed. > > Has anyone seen these flags working as they are supposed to ? Yes, they are working, as we use them in our Samba build. Debugging this would require looking at the Samba configuration and possibly collecting a trace. If my above assumption was wrong and this problem occurs with the CES Samba (gpfs.smb), please open a PMR for debugging this issue. If this is not the CES Samba, please contact the provider of the Samba package for additional support. Regards, Christof Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: Cameron Dunn Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] preventing HSM tape recall storms Date: Tue, Jul 3, 2018 6:22 AM HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape are then shared by Samba to Macs and PCs. MacOS Finder and Windows Explorer will want to display all the thumbnail images of a folder's contents, which will recall lots of files from tape. According to the Samba documentation this is preventable by setting the following ---------------------------------------------- https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html gpfs:recalls = [ yes | no ] When this option is set to no, an attempt to open an offline file will be rejected with access denied. This helps preventing recall storms triggered by careless applications like Finder and Explorer. yes(default) - Open files that are offline. This will recall the files from HSM. no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. Using this setting also requires gpfs:hsm to be set to yes. gpfs:hsm = [ yes | no ] Enable/Disable announcing if this FS has HSM enabled. no(default) - Do not announce HSM. yes - Announce HSM. -------------------------------------------------- However we could not get this to work. On Centos7/Samba4.5, smb.conf contained gpfs:hsm = yes gpfs:recalls = no (also tried setting gpfs:offline = yes, though this is not documented) We made a share containing image files that were then migrated to tape by LTFS-EE, to see if these flags were respected by OS X Finder or Windows Explorer. Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, so that when browsing the stubs in the share, the files were recalled from tape and the thumbnails displayed. Has anyone seen these flags working as they are supposed to ? Many thanks for any ideas, Cameron Cameron Dunn Advanced Computing Systems Administrator Advanced Computing Research Centre University of Bristol _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 9 20:31:32 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 9 Jul 2018 19:31:32 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? Message-ID: <4DFFEEC3-3AF6-4BAB-8D6C-7C6413469A44@vanderbilt.edu> Hi All, I am still working on my issue of the occasional high I/O wait times and that has raised another question ? I know that I can run mmfileid to see what files have a block on a given NSD, but is there a way to do the opposite? I.e. I want to know what NSDs a single file has its? blocks on? The mmlsattr command does not appear to show this information unless it?s got an undocumented option. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jul 9 21:21:29 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 9 Jul 2018 20:21:29 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? Message-ID: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> I don?t think you can do it directly, but you could probably use FileHeat to figure it out indirectly. Look at mmchconfig on how to set these: fileHeatLossPercent 20 fileHeatPeriodMinutes 1440 Then you can run a fairly simple policy scan to dump out the file names and heat value, sort what?s the most active to the top. I?ve done this, and it can prove helpful: define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) END]) rule fh1 external list 'fh' exec '' rule fh2 list 'fh' weight(FILE_HEAT) show( DISPLAY_NULL(FILE_HEAT) || '|' || varchar(file_size) ) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Monday, July 9, 2018 at 3:04 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] What NSDs does a file have blocks on? Hi All, I am still working on my issue of the occasional high I/O wait times and that has raised another question ? I know that I can run mmfileid to see what files have a block on a given NSD, but is there a way to do the opposite? I.e. I want to know what NSDs a single file has its? blocks on? The mmlsattr command does not appear to show this information unless it?s got an undocumented option. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Mon Jul 9 21:51:34 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Mon, 9 Jul 2018 16:51:34 -0400 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: <4DFFEEC3-3AF6-4BAB-8D6C-7C6413469A44@vanderbilt.edu> References: <4DFFEEC3-3AF6-4BAB-8D6C-7C6413469A44@vanderbilt.edu> Message-ID: Hi Kevin, >>I want to know what NSDs a single file has its? blocks on? You may use /usr/lpp/mmfs/samples/fpo/mmgetlocationto obtain the file-to-NSD block layout map. Use the -h option for this tools usage ( mmgetlocation -h). Sample output is below: # File-system block size is 4MiB and sample file is 40MiB. # ls -lh /mnt/gpfs3a/data_out/lf -rw-r--r-- 1 root root 40M Jul 9 16:42 /mnt/gpfs3a/data_out/lf # du -sh /mnt/gpfs3a/data_out/lf 40M /mnt/gpfs3a/data_out/lf # mmlsfs gpfs3a | grep 'Block size' -B 4194304 Block size # The file data is striped across 10 x NSDs (DMD_NSDX) constituting the file-system # /usr/lpp/mmfs/samples/fpo/mmgetlocation -f /mnt/gpfs3a/data_out/lf [FILE /mnt/gpfs3a/data_out/lf INFORMATION] FS_DATA_BLOCKSIZE : 4194304 (bytes) FS_META_DATA_BLOCKSIZE : 4194304 (bytes) FS_FILE_DATAREPLICA : 1 FS_FILE_METADATAREPLICA : 1 FS_FILE_STORAGEPOOLNAME : system FS_FILE_ALLOWWRITEAFFINITY : no FS_FILE_WRITEAFFINITYDEPTH : 0 FS_FILE_BLOCKGROUPFACTOR : 1 chunk(s)# 0 (offset 0) : [DMD_NSD5 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 1 (offset 4194304) : [DMD_NSD6 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 2 (offset 8388608) : [DMD_NSD7 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 3 (offset 12582912) : [DMD_NSD8 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 4 (offset 16777216) : [DMD_NSD9 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 5 (offset 20971520) : [DMD_NSD10 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 6 (offset 25165824) : [DMD_NSD1 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 7 (offset 29360128) : [DMD_NSD2 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 8 (offset 33554432) : [DMD_NSD3 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 9 (offset 37748736) : [DMD_NSD4 c72f1m5u39ib0,c72f1m5u37ib0] [FILE: /mnt/gpfs3a/data_out/lf SUMMARY INFO] replica1: c72f1m5u37ib0,c72f1m5u39ib0: 5 chunk(s) c72f1m5u39ib0,c72f1m5u37ib0: 5 chunk(s) Thanks and Regards, -Kums From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/09/2018 04:05 PM Subject: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I am still working on my issue of the occasional high I/O wait times and that has raised another question ? I know that I can run mmfileid to see what files have a block on a given NSD, but is there a way to do the opposite? I.e. I want to know what NSDs a single file has its? blocks on? The mmlsattr command does not appear to show this information unless it?s got an undocumented option. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Jul 9 22:04:15 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 9 Jul 2018 17:04:15 -0400 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> References: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> Message-ID: (psss... ) tsdbfs Not responsible for anything bad that happens...! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 9 22:03:21 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 9 Jul 2018 21:03:21 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: References: <4DFFEEC3-3AF6-4BAB-8D6C-7C6413469A44@vanderbilt.edu> Message-ID: <7D0DA547-4C19-4AE8-AFF8-BB0FBBF487AA@vanderbilt.edu> Hi Kums, Thanks so much ? this gave me exactly what I was looking for and the output was what I suspected I would see. Unfortunately, that means that the mystery of why we?re having these occasional high I/O wait times persists, but oh well? Kevin On Jul 9, 2018, at 3:51 PM, Kumaran Rajaram > wrote: Hi Kevin, >>I want to know what NSDs a single file has its? blocks on? You may use /usr/lpp/mmfs/samples/fpo/mmgetlocationto obtain the file-to-NSD block layout map. Use the -h option for this tools usage (mmgetlocation -h). Sample output is below: # File-system block size is 4MiB and sample file is 40MiB. # ls -lh /mnt/gpfs3a/data_out/lf -rw-r--r-- 1 root root 40M Jul 9 16:42 /mnt/gpfs3a/data_out/lf # du -sh /mnt/gpfs3a/data_out/lf 40M /mnt/gpfs3a/data_out/lf # mmlsfs gpfs3a | grep 'Block size' -B 4194304 Block size # The file data is striped across 10 x NSDs (DMD_NSDX) constituting the file-system # /usr/lpp/mmfs/samples/fpo/mmgetlocation -f /mnt/gpfs3a/data_out/lf [FILE /mnt/gpfs3a/data_out/lf INFORMATION] FS_DATA_BLOCKSIZE : 4194304 (bytes) FS_META_DATA_BLOCKSIZE : 4194304 (bytes) FS_FILE_DATAREPLICA : 1 FS_FILE_METADATAREPLICA : 1 FS_FILE_STORAGEPOOLNAME : system FS_FILE_ALLOWWRITEAFFINITY : no FS_FILE_WRITEAFFINITYDEPTH : 0 FS_FILE_BLOCKGROUPFACTOR : 1 chunk(s)# 0 (offset 0) : [DMD_NSD5 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 1 (offset 4194304) : [DMD_NSD6 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 2 (offset 8388608) : [DMD_NSD7 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 3 (offset 12582912) : [DMD_NSD8 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 4 (offset 16777216) : [DMD_NSD9 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 5 (offset 20971520) : [DMD_NSD10 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 6 (offset 25165824) : [DMD_NSD1 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 7 (offset 29360128) : [DMD_NSD2 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 8 (offset 33554432) : [DMD_NSD3 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 9 (offset 37748736) : [DMD_NSD4 c72f1m5u39ib0,c72f1m5u37ib0] [FILE: /mnt/gpfs3a/data_out/lf SUMMARY INFO] replica1: c72f1m5u37ib0,c72f1m5u39ib0: 5 chunk(s) c72f1m5u39ib0,c72f1m5u37ib0: 5 chunk(s) Thanks and Regards, -Kums From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 07/09/2018 04:05 PM Subject: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I am still working on my issue of the occasional high I/O wait times and that has raised another question ? I know that I can run mmfileid to see what files have a block on a given NSD, but is there a way to do the opposite? I.e. I want to know what NSDs a single file has its? blocks on? The mmlsattr command does not appear to show this information unless it?s got an undocumented option. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C523052f2a40c48efb5a808d5e5ddc6b0%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636667663044944884&sdata=Q2Wg8yDwA9yu%2FZgJXELr7V3qHAY7I7eKPTBHkqVKA5I%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jul 9 22:21:41 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 9 Jul 2018 21:21:41 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: References: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> Message-ID: <27572009-ACD1-4317-A335-301D42E99BDE@bham.ac.uk> I was going to say something like that ? e.g. blockaddr 563148261 Inode 563148261 snap 0 offset 0 N=2 1:45255923200 13:59403784320 1: and 13: in the output are the NSD disk devices for inode 563148261 Simon From: on behalf of "makaplan at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Monday, 9 July 2018 at 22:04 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? (psss... ) tsdbfs Not responsible for anything bad that happens...! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 9 22:44:07 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 9 Jul 2018 21:44:07 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <20180708174441.EE5BB17B422@gpfsug.org> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> <20180708174441.EE5BB17B422@gpfsug.org> Message-ID: Hi All, Time for a daily update on this saga? First off, responses to those who have responded to me: Yaron - we have QLogic switches, but I?ll RTFM and figure out how to clear the counters ? with a quick look via the CLI interface to one of them I don?t see how to even look at those counters, must less clear them, but I?ll do some digging. QLogic does have a GUI app, but given that the Mac version is PowerPC only I think that?s a dead end! :-O Jonathan - understood. We were just wanting to eliminate as much hardware as potential culprits as we could. The storage arrays will all get a power-cycle this Sunday when we take a downtime to do firmware upgrades on them ? the vendor is basically refusing to assist further until we get on the latest firmware. So ? we had noticed that things seem to calm down starting Friday evening and continuing throughout the weekend. We have a script that runs every half hour and if there?s any NSD servers where ?mmdiag ?iohist? shows an I/O > 1,000 ms, we get an alert (again, designed to alert us of a CBM failure). We only got three all weekend long (as opposed to last week, when the alerts were coming every half hour round the clock). Then, this morning I repeated the ?dd? test that I had run before and after replacing the FC cables going to ?eon34? and which had showed very typical I/O rates for all the NSDs except for the 4 in eon34, which were quite poor (~1.5 - 10 MB/sec). I ran the new tests this morning from different NSD servers and with a higher ?count? passed to dd to eliminate any potential caching effects. I ran the test twice from two different NSD servers and this morning all NSDs - including those on eon34 - showed normal I/O rates! Argh - so do we have a hardware problem or not?!? I still think we do, but am taking *nothing* for granted at this point! So today we also used another script we?ve written to do some investigation ? basically we took the script which runs ?mmdiag ?iohist? and added some options to it so that for every I/O greater than the threshold it will see which client issued the I/O. It then queries SLURM to see what jobs are running on that client. Interestingly enough, one user showed up waaaayyyyyy more often than anybody else. And many times she was on a node with only one other user who we know doesn?t access the GPFS filesystem and other times she was the only user on the node. We certainly recognize that correlation is not causation (she could be a victim and not the culprit), but she was on so many of the reported clients that we decided to investigate further ? but her jobs seem to have fairly modest I/O requirements. Each one processes 4 input files, which are basically just gzip?d text files of 1.5 - 5 GB in size. This is what, however, prompted my other query to the list about determining which NSDs a given file has its? blocks on. I couldn?t see how files of that size could have all their blocks on only a couple of NSDs in the pool (out of 19 total!) but wanted to verify that. The files that I have looked at are evenly spread out across the NSDs. So given that her files are spread across all 19 NSDs in the pool and the high I/O wait times are almost always only on LUNs in eon34 (and, more specifically, on two of the four LUNs in eon34) I?m pretty well convinced it?s not her jobs causing the problems ? I?m back to thinking a weird hardware issue. But if anyone wants to try to convince me otherwise, I?ll listen? Thanks! Kevin On Jul 8, 2018, at 12:32 PM, Yaron Daniel > wrote: Hi Clean all counters on the FC switches and see which port have errors . For brocade run : slotstatsclear statsclear porterrshow For cisco run: clear countersall There might be bad gbic/cable/Storage gbic, which can affect the performance, if there is something like that - u can see which ports have errors grow over time. Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Jonathan Buzzard > To: gpfsug-discuss at spectrumscale.org Date: 07/07/2018 11:43 AM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ On 07/07/18 01:28, Buterbaugh, Kevin L wrote: [SNIP] > > So, to try to rule out everything but the storage array we replaced the > FC cables going from the SAN switches to the array, plugging the new > cables into different ports on the SAN switches. Then we repeated the > dd tests from a different NSD server, which both eliminated the NSD > server and its? FC cables as a potential cause ? and saw results > virtually identical to the previous test. Therefore, we feel pretty > confident that it is the storage array and have let the vendor know all > of this. I was not thinking of doing anything quite as drastic as replacing stuff, more look into the logs on the switches in the FC network and examine them for packet errors. The above testing didn't eliminate bad optics in the storage array itself for example, though it does appear to be the storage arrays themselves. Sounds like they could do with a power cycle... JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=TM-kJsvzTX9cq_xmR5ITHclBCfO4FDvZ3ZxyugfJCfQ&s=Ass164qVEhb9fC4_VCmzfZeYd_BLOv9cZsfkrzqi8pM&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C7c1ced16f6d44055c63408d5e4fa7d2e%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636666686866066749&sdata=Viltitj3L9aScuuVKCLSp9FKkj7xdzWxsvvPVDSUqHw%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Jul 10 12:59:18 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 10 Jul 2018 11:59:18 +0000 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Message-ID: File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? mmadddisk fs1 -F new.nsd The following disks of proserv will be formatted on node srv-gpfs06: stor1v5tb85: size 5242880 MB Extending Allocation Map Disk stor1v5tb85 cannot be added to storage pool Plevel1. Allocation map cannot accommodate disks larger than 4194555 MB. Checking Allocation Map for storage pool Plevel1 mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Tue Jul 10 13:29:59 2018 From: spectrumscale at kiranghag.com (KG) Date: Tue, 10 Jul 2018 17:59:59 +0530 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Message-ID: Addendum to the question... How is this calculated? I figured out it is based on NSD sizes that are initially used but not exactly how. ?KG? On Tue, Jul 10, 2018 at 5:29 PM, Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > File system was originally created with 1TB NSDs (4) and I want to move it > to one 5TB NSD. Any way around this error? > > > > mmadddisk fs1 -F new.nsd > > > > The following disks of proserv will be formatted on node srv-gpfs06: > > stor1v5tb85: size 5242880 MB > > Extending Allocation Map > > Disk stor1v5tb85 cannot be added to storage pool Plevel1. > > *Allocation map cannot accommodate disks larger than 4194555 MB.* > > Checking Allocation Map for storage pool Plevel1 > > mmadddisk: tsadddisk failed. > > Verifying file system configuration information ... > > mmadddisk: Propagating the cluster configuration data to all > > affected nodes. This is an asynchronous process. > > mmadddisk: Command failed. Examine previous error messages to determine > cause. > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Tue Jul 10 13:42:55 2018 From: david_johnson at brown.edu (David D Johnson) Date: Tue, 10 Jul 2018 08:42:55 -0400 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: References: Message-ID: Whenever we start with adding disks of new sizes/speeds/controllers/machine rooms compared to existing NSD's in the filesystem, we generally add them to a new storage pool. Add policy rules to make use of the new pools as desired, migrate stale files to slow disk, active files to faster/newer disk, etc. > On Jul 10, 2018, at 8:29 AM, KG wrote: > > Addendum to the question... > > How is this calculated? I figured out it is based on NSD sizes that are initially used but not exactly how. > > > ?KG? > > On Tue, Jul 10, 2018 at 5:29 PM, Oesterlin, Robert > wrote: > File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? > > > > mmadddisk fs1 -F new.nsd > > > > The following disks of proserv will be formatted on node srv-gpfs06: > > stor1v5tb85: size 5242880 MB > > Extending Allocation Map > > Disk stor1v5tb85 cannot be added to storage pool Plevel1. > > Allocation map cannot accommodate disks larger than 4194555 MB. > > Checking Allocation Map for storage pool Plevel1 > > mmadddisk: tsadddisk failed. > > Verifying file system configuration information ... > > mmadddisk: Propagating the cluster configuration data to all > > affected nodes. This is an asynchronous process. > > mmadddisk: Command failed. Examine previous error messages to determine cause. > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Jul 10 14:00:48 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 10 Jul 2018 14:00:48 +0100 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: , , Message-ID: <1531227648.26036.139.camel@strath.ac.uk> On Mon, 2018-07-09 at 14:57 -0400, Frederick Stock wrote: > Another option is to request Apple to support the OFFLINE flag in the > SMB protocol. ?The more Mac customers making such a request (I have > asked others to do likewise) might convince Apple to add this > checking to their SMB client. > And we have a winner. The only workable solution is to get Apple to Finder to support the OFFLINE flag. However good luck getting Apple to actually do anything. An alternative approach might be to somehow detect the client connecting is running MacOS and prohibit recalls for them. However I am not sure the Samba team would be keen on accepting such patches unless it could be done in say VFS module. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From makaplan at us.ibm.com Tue Jul 10 14:08:45 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 10 Jul 2018 09:08:45 -0400 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: <27572009-ACD1-4317-A335-301D42E99BDE@bham.ac.uk> References: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> <27572009-ACD1-4317-A335-301D42E99BDE@bham.ac.uk> Message-ID: As long as we're giving hints... Seems tsdbfs has several subcommands that might be helpful. I like "inode" But there's also "listda" Subcommand "desc" will show you the structure of the file system under "disks:" you will see which disk numbers are which NSDs. Have fun, but DO NOT use the any of the *patch* subcommands! From: Simon Thompson To: gpfsug main discussion list Date: 07/09/2018 05:21 PM Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org I was going to say something like that ? e.g. blockaddr 563148261 Inode 563148261 snap 0 offset 0 N=2 1:45255923200 13:59403784320 1: and 13: in the output are the NSD disk devices for inode 563148261 Simon From: on behalf of "makaplan at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Monday, 9 July 2018 at 22:04 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? (psss... ) tsdbfs Not responsible for anything bad that happens...! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Jul 10 14:12:02 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 10 Jul 2018 14:12:02 +0100 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> <20180708174441.EE5BB17B422@gpfsug.org> Message-ID: <1531228322.26036.143.camel@strath.ac.uk> On Mon, 2018-07-09 at 21:44 +0000, Buterbaugh, Kevin L wrote: [SNIP] > Interestingly enough, one user showed up waaaayyyyyy more often than > anybody else. ?And many times she was on a node with only one other > user who we know doesn?t access the GPFS filesystem and other times > she was the only user on the node. ? > I have seen on our old HPC system which had been running fine for three years a particular user with a particular piece of software with presumably a particular access pattern trigger a firmware bug in a SAS drive (local disk to the node) that caused it to go offline (dead to the world and power/presence LED off) and only a power cycle of the node would bring it back. At first we through the drives where failing, because what the hell, but in the end a firmware update to the drives and they where fine. The moral of the story is don't rule out wacky access patterns from a single user causing problems. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From UWEFALKE at de.ibm.com Tue Jul 10 15:28:57 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 10 Jul 2018 16:28:57 +0200 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: References: Message-ID: Hi Bob, you sure the first added NSD was 1 TB? As often as i created a FS, the max NSD size was way larger than the one I added initially , not just the fourfold. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 10/07/2018 13:59 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Sent by: gpfsug-discuss-bounces at spectrumscale.org File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? mmadddisk fs1 -F new.nsd The following disks of proserv will be formatted on node srv-gpfs06: stor1v5tb85: size 5242880 MB Extending Allocation Map Disk stor1v5tb85 cannot be added to storage pool Plevel1. Allocation map cannot accommodate disks larger than 4194555 MB. Checking Allocation Map for storage pool Plevel1 mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From p.childs at qmul.ac.uk Tue Jul 10 15:50:54 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 10 Jul 2018 14:50:54 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes Message-ID: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London From bpappas at dstonline.com Tue Jul 10 16:08:03 2018 From: bpappas at dstonline.com (Bill Pappas) Date: Tue, 10 Jul 2018 15:08:03 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms (Bill Pappas) In-Reply-To: References: Message-ID: Years back I did run a trial (to buy) software solution on OSX to address this issue. It worked! It was not cheap and they probably no longer support it anyway. It might have been from a company called Group Logic. I would suggest not exposing HSM enabled file systems (in particular ones using tape on the back end) to your general CIFS (or even) GPFS/NFS clients. It produced years (2011-2015 of frustration with recall storms that made everyone mad. If someone else had success, I think we'd all like to know how they did it....but we gave up on that. In the end I would suggest setting up an explicit archive location using/HSM tape (or low cost, high densisty disk) that is not pointing to your traditional GPFS/CIFS/NFS clients that users must deliberately access (think portal) to check in/out cold data that they can stage to their primary workspace. It is possible you considered this idea or some variation of it anyway and rejected it for good reason (e.g. more pain for the users to stage data over from cold storage to primary workspacec). Bill Pappas 901-619-0585 bpappas at dstonline.com [1466780990050_DSTlogo.png] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Tuesday, July 10, 2018 9:50 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 78, Issue 32 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: preventing HSM tape recall storms (Jonathan Buzzard) 2. Re: What NSDs does a file have blocks on? (Marc A Kaplan) 3. Re: High I/O wait times (Jonathan Buzzard) 4. Re: Allocation map limits - any way around this? (Uwe Falke) 5. Same file opened by many nodes / processes (Peter Childs) ---------------------------------------------------------------------- Message: 1 Date: Tue, 10 Jul 2018 14:00:48 +0100 From: Jonathan Buzzard To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms Message-ID: <1531227648.26036.139.camel at strath.ac.uk> Content-Type: text/plain; charset="UTF-8" On Mon, 2018-07-09 at 14:57 -0400, Frederick Stock wrote: > Another option is to request Apple to support the OFFLINE flag in the > SMB protocol. ?The more Mac customers making such a request (I have > asked others to do likewise) might convince Apple to add this > checking to their SMB client. > And we have a winner. The only workable solution is to get Apple to Finder to support the OFFLINE flag. However good luck getting Apple to actually do anything. An alternative approach might be to somehow detect the client connecting is running MacOS and prohibit recalls for them. However I am not sure the Samba team would be keen on accepting such patches unless it could be done in say VFS module. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ Message: 2 Date: Tue, 10 Jul 2018 09:08:45 -0400 From: "Marc A Kaplan" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Message-ID: Content-Type: text/plain; charset="utf-8" As long as we're giving hints... Seems tsdbfs has several subcommands that might be helpful. I like "inode" But there's also "listda" Subcommand "desc" will show you the structure of the file system under "disks:" you will see which disk numbers are which NSDs. Have fun, but DO NOT use the any of the *patch* subcommands! From: Simon Thompson To: gpfsug main discussion list Date: 07/09/2018 05:21 PM Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org I was going to say something like that ? e.g. blockaddr 563148261 Inode 563148261 snap 0 offset 0 N=2 1:45255923200 13:59403784320 1: and 13: in the output are the NSD disk devices for inode 563148261 Simon From: on behalf of "makaplan at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Monday, 9 July 2018 at 22:04 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? (psss... ) tsdbfs Not responsible for anything bad that happens...! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Tue, 10 Jul 2018 14:12:02 +0100 From: Jonathan Buzzard To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] High I/O wait times Message-ID: <1531228322.26036.143.camel at strath.ac.uk> Content-Type: text/plain; charset="UTF-8" On Mon, 2018-07-09 at 21:44 +0000, Buterbaugh, Kevin L wrote: [SNIP] > Interestingly enough, one user showed up waaaayyyyyy more often than > anybody else. ?And many times she was on a node with only one other > user who we know doesn?t access the GPFS filesystem and other times > she was the only user on the node. ? > I have seen on our old HPC system which had been running fine for three years a particular user with a particular piece of software with presumably a particular access pattern trigger a firmware bug in a SAS drive (local disk to the node) that caused it to go offline (dead to the world and power/presence LED off) and only a power cycle of the node would bring it back. At first we through the drives where failing, because what the hell, but in the end a firmware update to the drives and they where fine. The moral of the story is don't rule out wacky access patterns from a single user causing problems. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ Message: 4 Date: Tue, 10 Jul 2018 16:28:57 +0200 From: "Uwe Falke" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Allocation map limits - any way around this? Message-ID: Content-Type: text/plain; charset="ISO-8859-1" Hi Bob, you sure the first added NSD was 1 TB? As often as i created a FS, the max NSD size was way larger than the one I added initially , not just the fourfold. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 10/07/2018 13:59 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Sent by: gpfsug-discuss-bounces at spectrumscale.org File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? mmadddisk fs1 -F new.nsd The following disks of proserv will be formatted on node srv-gpfs06: stor1v5tb85: size 5242880 MB Extending Allocation Map Disk stor1v5tb85 cannot be added to storage pool Plevel1. Allocation map cannot accommodate disks larger than 4194555 MB. Checking Allocation Map for storage pool Plevel1 mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 5 Date: Tue, 10 Jul 2018 14:50:54 +0000 From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Same file opened by many nodes / processes Message-ID: <4e038c492713f418242be208532e112f8ea50a9f.camel at qmul.ac.uk> Content-Type: text/plain; charset="utf-8" We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 78, Issue 32 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-1466780990.png Type: image/png Size: 6282 bytes Desc: Outlook-1466780990.png URL: From salut4tions at gmail.com Tue Jul 10 16:54:36 2018 From: salut4tions at gmail.com (Jordan Robertson) Date: Tue, 10 Jul 2018 11:54:36 -0400 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: References: Message-ID: To second David's comments: I don't believe changing the max NSD size for a given storage pool is possible (it may be tied to the per-pool allocation mapping?), so if you want to add more dataOnly NSD's to a filesystem and get that error you may need to create a new pool. The tricky bit is that I think this only works with dataOnly NSD's, as dataAndMetadata and metadataOnly NSD's only get added to the system pool which is locked in like any other. -Jordan On Tue, Jul 10, 2018 at 10:28 AM, Uwe Falke wrote: > Hi Bob, > you sure the first added NSD was 1 TB? As often as i created a FS, the max > NSD size was way larger than the one I added initially , not just the > fourfold. > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 10/07/2018 13:59 > Subject: [gpfsug-discuss] Allocation map limits - any way around > this? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > File system was originally created with 1TB NSDs (4) and I want to move it > to one 5TB NSD. Any way around this error? > > mmadddisk fs1 -F new.nsd > > The following disks of proserv will be formatted on node srv-gpfs06: > stor1v5tb85: size 5242880 MB > Extending Allocation Map > Disk stor1v5tb85 cannot be added to storage pool Plevel1. > Allocation map cannot accommodate disks larger than 4194555 MB. > Checking Allocation Map for storage pool Plevel1 > mmadddisk: tsadddisk failed. > Verifying file system configuration information ... > mmadddisk: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > mmadddisk: Command failed. Examine previous error messages to determine > cause. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Tue Jul 10 16:59:57 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 10 Jul 2018 17:59:57 +0200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: Hi, Peter, in theory, the first node opening a file should remain metanode until it closes the file, regardless how many other nodes open it in between (if all the nodes are within the same cluster). MFT is controlling the caching inodes and - AFAIK - also of indirect blocks. A 200 GiB file will most likely have indirect blocks, but just a few up to some tens, depending on the block size in the file system. The default MFT number is much larger. However, if you say the metanode is changing, that might cause some delays, as all token information has to be passed on to the next metanode (not sure how efficient that election is running). Having said that it could help if you use a dedicated node having the file open from start and all the time - this should prevent new metanodes being elected. If you do not get told a solution, you might want to run a trace of the mmbackup scan (maybe once with jobs accessing the file, once without). Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 10/07/2018 16:51 Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Tue Jul 10 17:15:14 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 10 Jul 2018 12:15:14 -0400 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: I would start by making sure that the application(s)... open the file O_RDONLY and then you may want to fiddle with the GPFS atime settings: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_atime.htm At first I thought "uge" was a typo, but I guess you are referring to: https://supcom.hgc.jp/english/utili_info/manual/uge.html Still not begin familiar, it would be "interesting" to know from a file operations point of view, what's going on in terms of opens, reads, closes : per second. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 10 17:17:58 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 10 Jul 2018 12:17:58 -0400 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: References: Message-ID: The only additional piece of information I would add is that you can see what the maximum NSD size is defined for a pool by looking at the output of mmdf. Fred Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jordan Robertson To: gpfsug main discussion list Date: 07/10/2018 11:54 AM Subject: Re: [gpfsug-discuss] Allocation map limits - any way around this? Sent by: gpfsug-discuss-bounces at spectrumscale.org To second David's comments: I don't believe changing the max NSD size for a given storage pool is possible (it may be tied to the per-pool allocation mapping?), so if you want to add more dataOnly NSD's to a filesystem and get that error you may need to create a new pool. The tricky bit is that I think this only works with dataOnly NSD's, as dataAndMetadata and metadataOnly NSD's only get added to the system pool which is locked in like any other. -Jordan On Tue, Jul 10, 2018 at 10:28 AM, Uwe Falke wrote: Hi Bob, you sure the first added NSD was 1 TB? As often as i created a FS, the max NSD size was way larger than the one I added initially , not just the fourfold. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 10/07/2018 13:59 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Sent by: gpfsug-discuss-bounces at spectrumscale.org File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? mmadddisk fs1 -F new.nsd The following disks of proserv will be formatted on node srv-gpfs06: stor1v5tb85: size 5242880 MB Extending Allocation Map Disk stor1v5tb85 cannot be added to storage pool Plevel1. Allocation map cannot accommodate disks larger than 4194555 MB. Checking Allocation Map for storage pool Plevel1 mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Jul 10 17:29:42 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 10 Jul 2018 16:29:42 +0000 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Message-ID: <787C309B-BE1E-47E4-B604-5E43262AFB26@nuance.com> Right - but it doesn?t give me the answer on how to best get around it. :-) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of IBM Spectrum Scale Reply-To: gpfsug main discussion list Date: Tuesday, July 10, 2018 at 11:18 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Allocation map limits - any way around this? The only additional piece of information I would add is that you can see what the maximum NSD size is defined for a pool by looking at the output of mmdf. Fred -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Tue Jul 10 17:59:17 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Tue, 10 Jul 2018 12:59:17 -0400 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: <787C309B-BE1E-47E4-B604-5E43262AFB26@nuance.com> References: <787C309B-BE1E-47E4-B604-5E43262AFB26@nuance.com> Message-ID: <72EAA3FB-5BAE-4C42-BC94-D9E98B4C11E7@brown.edu> I would as I suggested add the new NSD into a new pool in the same filesystem. Then I would migrate all the files off the old pool onto the new one. At this point you can deldisk the old ones or decide what else you?d want to do with them. -- ddj Dave Johnson > On Jul 10, 2018, at 12:29 PM, Oesterlin, Robert wrote: > > Right - but it doesn?t give me the answer on how to best get around it. :-) > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > From: on behalf of IBM Spectrum Scale > Reply-To: gpfsug main discussion list > Date: Tuesday, July 10, 2018 at 11:18 AM > To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Allocation map limits - any way around this? > > The only additional piece of information I would add is that you can see what the maximum NSD size is defined for a pool by looking at the output of mmdf. > > Fred > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scrusan at ddn.com Tue Jul 10 18:09:48 2018 From: scrusan at ddn.com (Steve Crusan) Date: Tue, 10 Jul 2018 17:09:48 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: <4E48904C-5B98-485B-B577-85532C7593A8@ddn.com> I?ve used ?preferDesignatedMnode=1? in the past, but that was for a specific usecase, and that would have to come from the direction of support. I guess if you wanted to test your metanode theory, you could open that file (and keep it open) on node from a different remote cluster, or one of your local NSD servers and see what kind of results you get out of it. ---- Steve Crusan scrusan at ddn.com (719) 695-3190 From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Tuesday, July 10, 2018 at 11:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes I would start by making sure that the application(s)... open the file O_RDONLY and then you may want to fiddle with the GPFS atime settings: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_atime.htm At first I thought "uge" was a typo, but I guess you are referring to: https://supcom.hgc.jp/english/utili_info/manual/uge.html Still not begin familiar, it would be "interesting" to know from a file operations point of view, what's going on in terms of opens, reads, closes : per second. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 10 18:19:47 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 10 Jul 2018 13:19:47 -0400 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From renata at slac.stanford.edu Tue Jul 10 19:35:28 2018 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Tue, 10 Jul 2018 11:35:28 -0700 (PDT) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Hi, many thanks to all of the suggestions for how to deal with this issue. Ftr, I tried this mmchnode --noquorum -N --force on the node that was reinstalled which reinstated some of the communications between the cluster nodes, but then when I restarted the cluster, communications begain to fail again, complaining about not enough CCR nodes for quorum. I ended up reinstalling the cluster since at this point the nodes couldn't mount the remote data and I thought it would be faster. Thanks again for all of the responses, Renata Dart SLAC National Accelerator Lab On Wed, 27 Jun 2018, IBM Spectrum Scale wrote: > >Hi Renata, > >You may want to reduce the set of quorum nodes. If your version supports >the --force option, you can run > >mmchnode --noquorum -N --force > >It is a good idea to configure tiebreaker disks in a cluster that has only >2 quorum nodes. > >Regards, The Spectrum Scale (GPFS) team > >------------------------------------------------------------------------------------------------------------------ > >If you feel that your question can benefit other users of Spectrum Scale >(GPFS), then please post it to the public IBM developerWroks Forum at >https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > >If your query concerns a potential software error in Spectrum Scale (GPFS) >and you have an IBM software maintenance contract please contact >1-800-237-5511 in the United States or your local IBM Service Center in >other countries. > >The forum is informally monitored as time permits and should not be used >for priority messages to the Spectrum Scale (GPFS) team. > > > >From: Renata Maria Dart >To: gpfsug-discuss at spectrumscale.org >Date: 06/27/2018 02:21 PM >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues >Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving >data, > >root at ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine >cause. > > >On the reinstalled node, this fails in the same way: > >[root at ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine >cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > From bbanister at jumptrading.com Tue Jul 10 21:50:23 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 10 Jul 2018 20:50:23 +0000 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: <72EAA3FB-5BAE-4C42-BC94-D9E98B4C11E7@brown.edu> References: <787C309B-BE1E-47E4-B604-5E43262AFB26@nuance.com> <72EAA3FB-5BAE-4C42-BC94-D9E98B4C11E7@brown.edu> Message-ID: +1 From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of david_johnson at brown.edu Sent: Tuesday, July 10, 2018 11:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Allocation map limits - any way around this? Note: External Email ________________________________ I would as I suggested add the new NSD into a new pool in the same filesystem. Then I would migrate all the files off the old pool onto the new one. At this point you can deldisk the old ones or decide what else you?d want to do with them. -- ddj Dave Johnson On Jul 10, 2018, at 12:29 PM, Oesterlin, Robert > wrote: Right - but it doesn?t give me the answer on how to best get around it. :-) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of IBM Spectrum Scale > Reply-To: gpfsug main discussion list > Date: Tuesday, July 10, 2018 at 11:18 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Allocation map limits - any way around this? The only additional piece of information I would add is that you can see what the maximum NSD size is defined for a pool by looking at the output of mmdf. Fred _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Tue Jul 10 22:06:27 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 10 Jul 2018 21:06:27 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, Message-ID: The reason I think the metanode is moving around is I'd done a limited amount of trying to track it down using "mmfsadm saferdump file" and it moved before I'd tracked down the correct metanode. But I might have been chasing ghosts, so it may be operating normally and nothing to worry about. The user reading the file only has read access to it from the file permissions, Mmbackup has only slowed down while this job has been running. As I say the scan for what to backup usally takes 40-60 minutes, but is currently taking 3-4 hours with these jobs running. I've seen it take 3 days when our storage went bad (slow and failing disks) but that is usally a sign of a bad disk and pulling the disk and rebuilding the RAID "fixed" that straight away. I cant see anything like that currently however. It might be that its network congestion were suffering from and nothing to do with token management but as the mmpmon bytes read data is running very high with this job and the load is spread over 50+ nodes it's difficult to see one culprit. It's a mixed speed ethernet network mainly 10GB connected although the nodes in question are legacy with only 1GB connections (and 40GB to the back of the storage. We're currently running 4.2.3-8 Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- IBM Spectrum Scale wrote ---- What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Tue Jul 10 22:12:16 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 10 Jul 2018 21:12:16 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: References: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> <27572009-ACD1-4317-A335-301D42E99BDE@bham.ac.uk> Message-ID: <5565130575454bf7a80802ecd55faec3@jumptrading.com> I know we are trying to be helpful, but suggesting that admins mess with undocumented, dangerous commands isn?t a good idea. If directed from an IBM support person with explicit instructions, then good enough, IFF it?s really required and worth the risk! I think the Kum?s suggestions are definitely a right way to handle this. In general, avoid running ts* commands unless directed by somebody that knows exactly what they are doing and understands your issue in great detail!! Just a word to the wise.. 2 cents? etc, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Tuesday, July 10, 2018 8:09 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Note: External Email ________________________________ As long as we're giving hints... Seems tsdbfs has several subcommands that might be helpful. I like "inode" But there's also "listda" Subcommand "desc" will show you the structure of the file system under "disks:" you will see which disk numbers are which NSDs. Have fun, but DO NOT use the any of the *patch* subcommands! From: Simon Thompson > To: gpfsug main discussion list > Date: 07/09/2018 05:21 PM Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I was going to say something like that ? e.g. blockaddr 563148261 Inode 563148261 snap 0 offset 0 N=2 1:45255923200 13:59403784320 1: and 13: in the output are the NSD disk devices for inode 563148261 Simon From: > on behalf of "makaplan at us.ibm.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 9 July 2018 at 22:04 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? (psss... ) tsdbfs Not responsible for anything bad that happens...! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Tue Jul 10 22:23:34 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 10 Jul 2018 21:23:34 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, , Message-ID: Oh the cluster is 296 nodes currently with a set size of 300 (mmcrfs -n 300) We're currently looking to upgrade the 1G connected nodes to 10G within the next few months. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Peter Childs wrote ---- The reason I think the metanode is moving around is I'd done a limited amount of trying to track it down using "mmfsadm saferdump file" and it moved before I'd tracked down the correct metanode. But I might have been chasing ghosts, so it may be operating normally and nothing to worry about. The user reading the file only has read access to it from the file permissions, Mmbackup has only slowed down while this job has been running. As I say the scan for what to backup usally takes 40-60 minutes, but is currently taking 3-4 hours with these jobs running. I've seen it take 3 days when our storage went bad (slow and failing disks) but that is usally a sign of a bad disk and pulling the disk and rebuilding the RAID "fixed" that straight away. I cant see anything like that currently however. It might be that its network congestion were suffering from and nothing to do with token management but as the mmpmon bytes read data is running very high with this job and the load is spread over 50+ nodes it's difficult to see one culprit. It's a mixed speed ethernet network mainly 10GB connected although the nodes in question are legacy with only 1GB connections (and 40GB to the back of the storage. We're currently running 4.2.3-8 Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- IBM Spectrum Scale wrote ---- What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 10 23:15:01 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 10 Jul 2018 18:15:01 -0400 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, , Message-ID: Regarding the permissions on the file I assume you are not using ACLs, correct? If you are then you would need to check what the ACL allows. Is your metadata on separate NSDs? Having metadata on separate NSDs, and preferably fast NSDs, would certainly help your mmbackup scanning. Have you looked at the information from netstat or similar network tools to see how your network is performing? Faster networks generally require a bit of OS tuning and some GPFS tuning to optimize their performance. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: gpfsug main discussion list Date: 07/10/2018 05:23 PM Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org Oh the cluster is 296 nodes currently with a set size of 300 (mmcrfs -n 300) We're currently looking to upgrade the 1G connected nodes to 10G within the next few months. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Peter Childs wrote ---- The reason I think the metanode is moving around is I'd done a limited amount of trying to track it down using "mmfsadm saferdump file" and it moved before I'd tracked down the correct metanode. But I might have been chasing ghosts, so it may be operating normally and nothing to worry about. The user reading the file only has read access to it from the file permissions, Mmbackup has only slowed down while this job has been running. As I say the scan for what to backup usally takes 40-60 minutes, but is currently taking 3-4 hours with these jobs running. I've seen it take 3 days when our storage went bad (slow and failing disks) but that is usally a sign of a bad disk and pulling the disk and rebuilding the RAID "fixed" that straight away. I cant see anything like that currently however. It might be that its network congestion were suffering from and nothing to do with token management but as the mmpmon bytes read data is running very high with this job and the load is spread over 50+ nodes it's difficult to see one culprit. It's a mixed speed ethernet network mainly 10GB connected although the nodes in question are legacy with only 1GB connections (and 40GB to the back of the storage. We're currently running 4.2.3-8 Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- IBM Spectrum Scale wrote ---- What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Wed Jul 11 13:30:16 2018 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Wed, 11 Jul 2018 14:30:16 +0200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From heiner.billich at psi.ch Wed Jul 11 14:40:46 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Wed, 11 Jul 2018 13:40:46 +0000 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown Message-ID: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> Hello, I have two nodes which hang on ?mmshutdown?, in detail the command ?/sbin/rmmod mmfs26? hangs. I get kernel messages which I append below. I wonder if this looks familiar to somebody? Is it a known bug? I can avoid the issue if I reduce pagepool from 128G to 64G. Running ?systemctl stop gpfs? shows the same issue. It forcefully terminates after a while, but ?rmmod? stays stuck. Two functions cxiReleaseAndForgetPages and put_page seem to be involved, the first part of gpfs, the second a kernel call. The servers have 256G memory and 72 (virtual) cores each. I run 5.0.1-1 on RHEL7.4 with kernel 3.10.0-693.17.1.el7.x86_64. I can try to switch back to 5.0.0 Thank you & kind regards, Heiner Jul 11 14:12:04 node-1.x.y mmremote[1641]: Unloading module mmfs26 Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The Spectrum Scale service process not running on this node. Normal operation cannot be done Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [I] Event raised: The Spectrum Scale service process is running Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The node is not able to form a quorum with the other available nodes. Jul 11 14:12:38 node-1.x.y sshd[2826]: Connection closed by xxx port 52814 [preauth] Jul 11 14:12:41 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [rmmod:2695] Jul 11 14:12:41 node-1.x.y kernel: Modules linked in: mmfs26(OE-) mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect Jul 11 14:12:41 node-1.x.y kernel: sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] Jul 11 14:12:41 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 Jul 11 14:12:41 node-1.x.y kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Jul 11 14:12:41 node-1.x.y kernel: task: ffff8808c4814f10 ti: ffff881619778000 task.ti: ffff881619778000 Jul 11 14:12:41 node-1.x.y kernel: RIP: 0010:[] [] put_compound_page+0xc3/0x174 Jul 11 14:12:41 node-1.x.y kernel: RSP: 0018:ffff88161977bd50 EFLAGS: 00000246 Jul 11 14:12:41 node-1.x.y kernel: RAX: 0000000000000283 RBX: 00000000fae3d201 RCX: 0000000000000284 Jul 11 14:12:41 node-1.x.y kernel: RDX: 0000000000000283 RSI: 0000000000000246 RDI: ffffea003d478000 Jul 11 14:12:41 node-1.x.y kernel: RBP: ffff88161977bd68 R08: ffff881ffae3d1e0 R09: 0000000180800059 Jul 11 14:12:41 node-1.x.y kernel: R10: 00000000fae3d201 R11: ffffea007feb8f40 R12: 00000000fae3d201 Jul 11 14:12:41 node-1.x.y kernel: R13: ffff88161977bd40 R14: 0000000000000000 R15: ffff88161977bd40 Jul 11 14:12:41 node-1.x.y kernel: FS: 00007f81a1db0740(0000) GS:ffff883ffee80000(0000) knlGS:0000000000000000 Jul 11 14:12:41 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 11 14:12:41 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: 0000000c36b2c000 CR4: 00000000001607e0 Jul 11 14:12:41 node-1.x.y kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 11 14:12:41 node-1.x.y kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 11 14:12:41 node-1.x.y kernel: Call Trace: Jul 11 14:12:41 node-1.x.y kernel: [] put_page+0x45/0x50 Jul 11 14:12:41 node-1.x.y kernel: [] cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] ? kmem_cache_free+0x1e2/0x200 Jul 11 14:12:41 node-1.x.y kernel: [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] SyS_delete_module+0x19b/0x300 Jul 11 14:12:41 node-1.x.y kernel: [] system_call_fastpath+0x16/0x1b Jul 11 14:12:41 node-1.x.y kernel: Code: d1 00 00 00 4c 89 e7 e8 3a ff ff ff e9 c4 00 00 00 4c 39 e3 74 c1 41 8b 54 24 1c 85 d2 74 b8 8d 4a 01 89 d0 f0 41 0f b1 4c 24 1c <39> c2 74 04 89 c2 eb e8 e8 f3 f0 ae ff 49 89 c5 f0 41 0f ba 2c Jul 11 14:13:23 node-1.x.y systemd[1]: gpfs.service stopping timed out. Terminating. Jul 11 14:13:27 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 21s! [rmmod:2695] Jul 11 14:13:27 node-1.x.y kernel: Modules linked in: mmfs26(OE-) mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler Jul 11 14:13:27 node-1.x.y kernel: INFO: rcu_sched detected stalls on CPUs/tasks: Jul 11 14:13:27 node-1.x.y kernel: { Jul 11 14:13:27 node-1.x.y kernel: 28 Jul 11 14:13:27 node-1.x.y kernel: } Jul 11 14:13:27 node-1.x.y kernel: (detected by 17, t=60002 jiffies, g=267734, c=267733, q=36089) Jul 11 14:13:27 node-1.x.y kernel: Task dump for CPU 28: Jul 11 14:13:27 node-1.x.y kernel: rmmod R Jul 11 14:13:27 node-1.x.y kernel: running task Jul 11 14:13:27 node-1.x.y kernel: 0 2695 2642 0x00000008 Jul 11 14:13:27 node-1.x.y kernel: Call Trace: Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? __free_slab+0xdc/0x200 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? __put_compound_page+0x22/0x22 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? put_page+0x45/0x50 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? SyS_delete_module+0x19b/0x300 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? system_call_fastpath+0x16/0x1b Jul 11 14:13:27 node-1.x.y kernel: acpi_power_meter Jul 11 14:13:27 node-1.x.y kernel: binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] Jul 11 14:13:27 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 Jul 11 14:13:27 node-1.x.y kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Jul 11 14:13:27 node-1.x.y kernel: task: ffff8808c4814f10 ti: ffff881619778000 task.ti: ffff881619778000 Jul 11 14:13:27 node-1.x.y kernel: RIP: 0010:[] [] __put_compound_page+0x22/0x22 Jul 11 14:13:27 node-1.x.y kernel: RSP: 0018:ffff88161977bd70 EFLAGS: 00000282 Jul 11 14:13:27 node-1.x.y kernel: RAX: 002fffff00008010 RBX: 0000000000000135 RCX: 00000000000001c1 Jul 11 14:13:27 node-1.x.y kernel: RDX: ffff8814adbbf000 RSI: 0000000000000246 RDI: ffffea00650e7040 Jul 11 14:13:27 node-1.x.y kernel: RBP: ffff88161977bd78 R08: ffff881ffae3df60 R09: 0000000180800052 Jul 11 14:13:27 node-1.x.y kernel: R10: 00000000fae3db01 R11: ffffea007feb8f40 R12: ffff881ffae3df60 Jul 11 14:13:27 node-1.x.y kernel: R13: 0000000180800052 R14: 00000000fae3db01 R15: ffffea007feb8f40 Jul 11 14:13:27 node-1.x.y kernel: FS: 00007f81a1db0740(0000) GS:ffff883ffee80000(0000) knlGS:0000000000000000 Jul 11 14:13:27 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 11 14:13:27 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: 0000000c36b2c000 CR4: 00000000001607e0 Jul 11 14:13:27 node-1.x.y kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 11 14:13:27 node-1.x.y kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 11 14:13:27 node-1.x.y kernel: Call Trace: Jul 11 14:13:27 node-1.x.y kernel: [] ? put_page+0x45/0x50 Jul 11 14:13:27 node-1.x.y kernel: [] cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] SyS_delete_module+0x19b/0x300 Jul 11 14:13:27 node-1.x.y kernel: [] system_call_fastpath+0x16/0x1b Jul 11 14:13:27 node-1.x.y kernel: Code: c0 0f 95 c0 0f b6 c0 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 8b 07 48 89 fb a8 20 74 05 e8 0c f8 ae ff 48 89 df ff 53 60 5b 5d c3 <0f> 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 07 48 89 fb f6 -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jul 11 14:47:06 2018 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 11 Jul 2018 06:47:06 -0700 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> Message-ID: Hi, what does numactl -H report ? also check if this is set to yes : root at fab3a:~# mmlsconfig numaMemoryInterleave numaMemoryInterleave yes Sven On Wed, Jul 11, 2018 at 6:40 AM Billich Heinrich Rainer (PSI) < heiner.billich at psi.ch> wrote: > Hello, > > > > I have two nodes which hang on ?mmshutdown?, in detail the command > ?/sbin/rmmod mmfs26? hangs. I get kernel messages which I append below. I > wonder if this looks familiar to somebody? Is it a known bug? I can avoid > the issue if I reduce pagepool from 128G to 64G. > > > > Running ?systemctl stop gpfs? shows the same issue. It forcefully > terminates after a while, but ?rmmod? stays stuck. > > > > Two functions cxiReleaseAndForgetPages and put_page seem to be involved, > the first part of gpfs, the second a kernel call. > > > > The servers have 256G memory and 72 (virtual) cores each. > > I run 5.0.1-1 on RHEL7.4 with kernel 3.10.0-693.17.1.el7.x86_64. > > > > I can try to switch back to 5.0.0 > > > > Thank you & kind regards, > > > > Heiner > > > > > > > > Jul 11 14:12:04 node-1.x.y mmremote[1641]: Unloading module mmfs26 > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The Spectrum > Scale service process not running on this node. Normal operation cannot be > done > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [I] Event raised: The Spectrum > Scale service process is running > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The node is > not able to form a quorum with the other available nodes. > > Jul 11 14:12:38 node-1.x.y sshd[2826]: Connection closed by xxx port 52814 > [preauth] > > > > Jul 11 14:12:41 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 > stuck for 23s! [rmmod:2695] > > > > Jul 11 14:12:41 node-1.x.y kernel: Modules linked in: mmfs26(OE-) > mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) > rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) > mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) > mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp > coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt > pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler > acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc > ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 > i2c_algo_bit drm_kms_helper syscopyarea sysfillrect > > Jul 11 14:12:41 node-1.x.y kernel: sysimgblt fb_sys_fops ttm ixgbe > mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp > crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca > dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] > > Jul 11 14:12:41 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: > G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > Jul 11 14:12:41 node-1.x.y kernel: Hardware name: HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Jul 11 14:12:41 node-1.x.y kernel: task: ffff8808c4814f10 ti: > ffff881619778000 task.ti: ffff881619778000 > > Jul 11 14:12:41 node-1.x.y kernel: RIP: 0010:[] > [] put_compound_page+0xc3/0x174 > > Jul 11 14:12:41 node-1.x.y kernel: RSP: 0018:ffff88161977bd50 EFLAGS: > 00000246 > > Jul 11 14:12:41 node-1.x.y kernel: RAX: 0000000000000283 RBX: > 00000000fae3d201 RCX: 0000000000000284 > > Jul 11 14:12:41 node-1.x.y kernel: RDX: 0000000000000283 RSI: > 0000000000000246 RDI: ffffea003d478000 > > Jul 11 14:12:41 node-1.x.y kernel: RBP: ffff88161977bd68 R08: > ffff881ffae3d1e0 R09: 0000000180800059 > > Jul 11 14:12:41 node-1.x.y kernel: R10: 00000000fae3d201 R11: > ffffea007feb8f40 R12: 00000000fae3d201 > > Jul 11 14:12:41 node-1.x.y kernel: R13: ffff88161977bd40 R14: > 0000000000000000 R15: ffff88161977bd40 > > Jul 11 14:12:41 node-1.x.y kernel: FS: 00007f81a1db0740(0000) > GS:ffff883ffee80000(0000) knlGS:0000000000000000 > > Jul 11 14:12:41 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Jul 11 14:12:41 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: > 0000000c36b2c000 CR4: 00000000001607e0 > > Jul 11 14:12:41 node-1.x.y kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Jul 11 14:12:41 node-1.x.y kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > > > > Jul 11 14:12:41 node-1.x.y kernel: Call Trace: > > Jul 11 14:12:41 node-1.x.y kernel: [] put_page+0x45/0x50 > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] ? > kmem_cache_free+0x1e2/0x200 > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] mmfs+0xc85/0xca0 > [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] > SyS_delete_module+0x19b/0x300 > > Jul 11 14:12:41 node-1.x.y kernel: [] > system_call_fastpath+0x16/0x1b > > Jul 11 14:12:41 node-1.x.y kernel: Code: d1 00 00 00 4c 89 e7 e8 3a ff ff > ff e9 c4 00 00 00 4c 39 e3 74 c1 41 8b 54 24 1c 85 d2 74 b8 8d 4a 01 89 d0 > f0 41 0f b1 4c 24 1c <39> c2 74 04 89 c2 eb e8 e8 f3 f0 ae ff 49 89 c5 f0 > 41 0f ba 2c > > > > Jul 11 14:13:23 node-1.x.y systemd[1]: gpfs.service stopping timed out. > Terminating. > > > > Jul 11 14:13:27 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 > stuck for 21s! [rmmod:2695] > > > > Jul 11 14:13:27 node-1.x.y kernel: Modules linked in: mmfs26(OE-) > mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) > rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) > mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) > mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp > coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt > pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler > > Jul 11 14:13:27 node-1.x.y kernel: INFO: rcu_sched detected stalls on > CPUs/tasks: > > Jul 11 14:13:27 node-1.x.y kernel: { > > Jul 11 14:13:27 node-1.x.y kernel: 28 > > Jul 11 14:13:27 node-1.x.y kernel: } > > Jul 11 14:13:27 node-1.x.y kernel: (detected by 17, t=60002 jiffies, > g=267734, c=267733, q=36089) > > Jul 11 14:13:27 node-1.x.y kernel: Task dump for CPU 28: > > Jul 11 14:13:27 node-1.x.y kernel: rmmod R > > Jul 11 14:13:27 node-1.x.y kernel: running task > > Jul 11 14:13:27 node-1.x.y kernel: 0 2695 2642 0x00000008 > > Jul 11 14:13:27 node-1.x.y kernel: Call Trace: > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > __free_slab+0xdc/0x200 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > __put_compound_page+0x22/0x22 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > put_page+0x45/0x50 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > mmfs+0xc85/0xca0 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > SyS_delete_module+0x19b/0x300 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > system_call_fastpath+0x16/0x1b > > Jul 11 14:13:27 node-1.x.y kernel: acpi_power_meter > > Jul 11 14:13:27 node-1.x.y kernel: binfmt_misc nfsd auth_rpcgss nfs_acl > lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif > crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul > mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa > pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log > dm_mod [last unloaded: tracedev] > > Jul 11 14:13:27 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: > G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > Jul 11 14:13:27 node-1.x.y kernel: Hardware name: HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Jul 11 14:13:27 node-1.x.y kernel: task: ffff8808c4814f10 ti: > ffff881619778000 task.ti: ffff881619778000 > > Jul 11 14:13:27 node-1.x.y kernel: RIP: 0010:[] > [] __put_compound_page+0x22/0x22 > > Jul 11 14:13:27 node-1.x.y kernel: RSP: 0018:ffff88161977bd70 EFLAGS: > 00000282 > > Jul 11 14:13:27 node-1.x.y kernel: RAX: 002fffff00008010 RBX: > 0000000000000135 RCX: 00000000000001c1 > > Jul 11 14:13:27 node-1.x.y kernel: RDX: ffff8814adbbf000 RSI: > 0000000000000246 RDI: ffffea00650e7040 > > Jul 11 14:13:27 node-1.x.y kernel: RBP: ffff88161977bd78 R08: > ffff881ffae3df60 R09: 0000000180800052 > > Jul 11 14:13:27 node-1.x.y kernel: R10: 00000000fae3db01 R11: > ffffea007feb8f40 R12: ffff881ffae3df60 > > Jul 11 14:13:27 node-1.x.y kernel: R13: 0000000180800052 R14: > 00000000fae3db01 R15: ffffea007feb8f40 > > Jul 11 14:13:27 node-1.x.y kernel: FS: 00007f81a1db0740(0000) > GS:ffff883ffee80000(0000) knlGS:0000000000000000 > > Jul 11 14:13:27 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Jul 11 14:13:27 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: > 0000000c36b2c000 CR4: 00000000001607e0 > > Jul 11 14:13:27 node-1.x.y kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Jul 11 14:13:27 node-1.x.y kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > > Jul 11 14:13:27 node-1.x.y kernel: Call Trace: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > put_page+0x45/0x50 > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] mmfs+0xc85/0xca0 > [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] > SyS_delete_module+0x19b/0x300 > > Jul 11 14:13:27 node-1.x.y kernel: [] > system_call_fastpath+0x16/0x1b > > Jul 11 14:13:27 node-1.x.y kernel: Code: c0 0f 95 c0 0f b6 c0 5d c3 0f 1f > 44 00 00 55 48 89 e5 53 48 8b 07 48 89 fb a8 20 74 05 e8 0c f8 ae ff 48 89 > df ff 53 60 5b 5d c3 <0f> 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 07 > 48 89 fb f6 > > > > -- > > Paul Scherrer Institut > > Science IT > > Heiner Billich > > WHGA 106 > > CH 5232 Villigen PSI > > 056 310 36 02 > > https://www.psi.ch > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 11 15:32:37 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 11 Jul 2018 14:32:37 +0000 Subject: [gpfsug-discuss] mmdiag --iohist question Message-ID: <351F676D-D785-4895-A278-3BEA717B9C87@vanderbilt.edu> Hi All, Quick question about ?mmdiag ?iohist? that is not documented in the man page ? what does it mean if the client IP address field is blank? That the NSD server itself issued the I/O? Or ??? This only happens occasionally ? and the way I discovered it was that our Python script that takes ?mmdiag ?iohist? output, looks up the client IP for any waits above the threshold, converts that to a hostname, and queries SLURM for whose jobs are on that client started occasionally throwing an exception ? and when I started looking at the ?mmdiag ?iohist? output itself I do see times when there is no client IP address listed for a I/O wait. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zacekm at img.cas.cz Thu Jul 12 07:46:22 2018 From: zacekm at img.cas.cz (Michal Zacek) Date: Thu, 12 Jul 2018 08:46:22 +0200 Subject: [gpfsug-discuss] File placement rule for new files in directory Message-ID: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> Hello, it is possible to create file placement policy for new files in one directory? I need something like this --> All new files created in directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". Thanks. Best regards, Michal -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3776 bytes Desc: Elektronicky podpis S/MIME URL: From S.J.Thompson at bham.ac.uk Thu Jul 12 09:04:11 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 12 Jul 2018 08:04:11 +0000 Subject: [gpfsug-discuss] File placement rule for new files in directory In-Reply-To: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> Message-ID: <8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> Is ABCD a fileset? If so, its easy with something like: RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name') Simon ?On 12/07/2018, 07:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of zacekm at img.cas.cz" wrote: Hello, it is possible to create file placement policy for new files in one directory? I need something like this --> All new files created in directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". Thanks. Best regards, Michal From Renar.Grunenberg at huk-coburg.de Thu Jul 12 09:17:37 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 12 Jul 2018 08:17:37 +0000 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Message-ID: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> Hallo All, we see after a reboot of two NSD-Servers some disks in different filesystems are down and we don?t see why. The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and SS 5.0.1.1. The question now, there are any log, structures in the gpfs deamon that log these situation? What was the reason why the deamon hast no access to the disks at that startup phase. Any hints are appreciated. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From smita.raut at in.ibm.com Thu Jul 12 09:39:20 2018 From: smita.raut at in.ibm.com (Smita J Raut) Date: Thu, 12 Jul 2018 14:09:20 +0530 Subject: [gpfsug-discuss] File placement rule for new files in directory In-Reply-To: <8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> <8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> Message-ID: If ABCD is not a fileset then below rule can be used- RULE 'ABCD-rule-01' SET POOL 'fastdata' WHERE PATH_NAME LIKE ' /gpfs/gpfs01/ABCD/%' Thanks, Smita From: Simon Thompson To: gpfsug main discussion list Date: 07/12/2018 01:34 PM Subject: Re: [gpfsug-discuss] File placement rule for new files in directory Sent by: gpfsug-discuss-bounces at spectrumscale.org Is ABCD a fileset? If so, its easy with something like: RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name') Simon On 12/07/2018, 07:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of zacekm at img.cas.cz" wrote: Hello, it is possible to create file placement policy for new files in one directory? I need something like this --> All new files created in directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". Thanks. Best regards, Michal _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 12 09:40:06 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 12 Jul 2018 08:40:06 +0000 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Message-ID: <34BB4D15-5F76-453B-AC8C-FF5096133296@bham.ac.uk> How are the disks attached? We have some IB/SRP storage that is sometimes a little slow to appear in multipath and have seen this in the past (we since set autoload=off and always check multipath before restarting GPFS on the node). Simon From: on behalf of "Renar.Grunenberg at huk-coburg.de" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 12 July 2018 at 09:17 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Hallo All, we see after a reboot of two NSD-Servers some disks in different filesystems are down and we don?t see why. The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and SS 5.0.1.1. The question now, there are any log, structures in the gpfs deamon that log these situation? What was the reason why the deamon hast no access to the disks at that startup phase. Any hints are appreciated. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zacekm at img.cas.cz Thu Jul 12 09:49:38 2018 From: zacekm at img.cas.cz (Michal Zacek) Date: Thu, 12 Jul 2018 10:49:38 +0200 Subject: [gpfsug-discuss] File placement rule for new files in directory In-Reply-To: References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> <8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> Message-ID: <3cf065b3-383b-d1ff-1a33-3cc4b5845274@img.cas.cz> That's perfect, thank you both. Best regards Michal Dne 12.7.2018 v 10:39 Smita J Raut napsal(a): > If ABCD is not a fileset then below rule can be used- > > RULE 'ABCD-rule-01' SET POOL 'fastdata' WHERE PATH_NAME LIKE > '/gpfs/gpfs01/ABCD/%' > > Thanks, > Smita > > > > From: Simon Thompson > To: gpfsug main discussion list > Date: 07/12/2018 01:34 PM > Subject: Re: [gpfsug-discuss] File placement rule for new files in > directory > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Is ABCD a fileset? If so, its easy with something like: > > RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name') > > Simon > > On 12/07/2018, 07:56, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of zacekm at img.cas.cz" on behalf of zacekm at img.cas.cz> wrote: > > ? ?Hello, > > ? ?it is possible to create file placement policy for new files in one > ? ?directory? I need something like this --> All new files created in > ? ?directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". > ? ?Thanks. > > ? ?Best regards, > ? ?Michal > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3776 bytes Desc: Elektronicky podpis S/MIME URL: From Achim.Rehor at de.ibm.com Thu Jul 12 10:47:26 2018 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Thu, 12 Jul 2018 11:47:26 +0200 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot In-Reply-To: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> References: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Thu Jul 12 11:01:29 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 12 Jul 2018 10:01:29 +0000 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot In-Reply-To: References: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> Message-ID: <63cd931c1977483089ad2d9546803461@SMXRF105.msg.hukrf.de> Hallo Achim, hallo Simon, first thanks for your answers. I think Achims answers map these at best. The nsd-servers (only 2) for these disk were mistakenly restart in a same time window. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Achim Rehor Gesendet: Donnerstag, 12. Juli 2018 11:47 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Analyse steps if disk are down after reboot Hi Renar, whenever an access to a NSD happens, there is a potential that the node cannot access the disk, so if the (only) NSD server is down, there will be no chance to access the disk, and the disk will be set down. If you have twintailed disks, the 'second' (or possibly some more) NSD server will be asked, switching to networked access, and in that case only if that also fails, the disk will be set to down as well. Not sure how your setup is, but if you reboot 2 NSD servers, and some client possibly did IO to a file served by just these 2, then the 'down' state would be explainable. Rebooting of an NSD server should never set a disk to down, except, he was the only one serving that NSD. Mit freundlichen Gr??en / Kind regards Achim Rehor ________________________________ Software Technical Support Specialist AIX/ Emea HPC Support [cid:image001.gif at 01D419D7.A9373E60] IBM Certified Advanced Technical Expert - Power Systems with AIX TSCC Software Service, Dept. 7922 Global Technology Services ________________________________ Phone: +49-7034-274-7862 IBM Deutschland E-Mail: Achim.Rehor at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martin Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940 From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 12/07/2018 10:17 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hallo All, we see after a reboot of two NSD-Servers some disks in different filesystems are down and we don?t see why. The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and SS 5.0.1.1. The question now, there are any log, structures in the gpfs deamon that log these situation? What was the reason why the deamon hast no access to the disks at that startup phase. Any hints are appreciated. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 7182 bytes Desc: image001.gif URL: From scale at us.ibm.com Thu Jul 12 12:33:39 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 12 Jul 2018 07:33:39 -0400 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot In-Reply-To: <63cd931c1977483089ad2d9546803461@SMXRF105.msg.hukrf.de> References: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> <63cd931c1977483089ad2d9546803461@SMXRF105.msg.hukrf.de> Message-ID: Just to follow up on the question about where to learn why a NSD is marked down you should see a message in the GPFS log, /var/adm/ras/mmfs.log.* Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" To: 'gpfsug main discussion list' Date: 07/12/2018 06:01 AM Subject: Re: [gpfsug-discuss] Analyse steps if disk are down after reboot Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Achim, hallo Simon, first thanks for your answers. I think Achims answers map these at best. The nsd-servers (only 2) for these disk were mistakenly restart in a same time window. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Achim Rehor Gesendet: Donnerstag, 12. Juli 2018 11:47 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Analyse steps if disk are down after reboot Hi Renar, whenever an access to a NSD happens, there is a potential that the node cannot access the disk, so if the (only) NSD server is down, there will be no chance to access the disk, and the disk will be set down. If you have twintailed disks, the 'second' (or possibly some more) NSD server will be asked, switching to networked access, and in that case only if that also fails, the disk will be set to down as well. Not sure how your setup is, but if you reboot 2 NSD servers, and some client possibly did IO to a file served by just these 2, then the 'down' state would be explainable. Rebooting of an NSD server should never set a disk to down, except, he was the only one serving that NSD. Mit freundlichen Gr??en / Kind regards Achim Rehor Software Technical Support Specialist AIX/ Emea HPC Support IBM Certified Advanced Technical Expert - Power Systems with AIX TSCC Software Service, Dept. 7922 Global Technology Services Phone: +49-7034-274-7862 IBM Deutschland E-Mail: Achim.Rehor at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martin Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940 From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" < gpfsug-discuss at spectrumscale.org> Date: 12/07/2018 10:17 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we see after a reboot of two NSD-Servers some disks in different filesystems are down and we don?t see why. The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and SS 5.0.1.1. The question now, there are any log, structures in the gpfs deamon that log these situation? What was the reason why the deamon hast no access to the disks at that startup phase. Any hints are appreciated. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From UWEFALKE at de.ibm.com Thu Jul 12 14:16:23 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 12 Jul 2018 15:16:23 +0200 Subject: [gpfsug-discuss] File placement rule for new files in directory In-Reply-To: <3cf065b3-383b-d1ff-1a33-3cc4b5845274@img.cas.cz> References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz><8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> <3cf065b3-383b-d1ff-1a33-3cc4b5845274@img.cas.cz> Message-ID: If that has not changed, then: PATH_NAME is not usable for placement policies. Only the FILESET_NAME attribute is accepted. One might think, that PATH_NAME is as known on creating a new file as is FILESET_NAME, but for some reason the documentation says: "When file attributes are referenced in initial placement rules, only the following attributes are valid: FILESET_NAME, GROUP_ID, NAME, and USER_ID. " Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Michal Zacek To: gpfsug-discuss at spectrumscale.org Date: 12/07/2018 10:49 Subject: Re: [gpfsug-discuss] File placement rule for new files in directory Sent by: gpfsug-discuss-bounces at spectrumscale.org That's perfect, thank you both. Best regards Michal Dne 12.7.2018 v 10:39 Smita J Raut napsal(a): If ABCD is not a fileset then below rule can be used- RULE 'ABCD-rule-01' SET POOL 'fastdata' WHERE PATH_NAME LIKE '/gpfs/gpfs01/ABCD/%' Thanks, Smita From: Simon Thompson To: gpfsug main discussion list Date: 07/12/2018 01:34 PM Subject: Re: [gpfsug-discuss] File placement rule for new files in directory Sent by: gpfsug-discuss-bounces at spectrumscale.org Is ABCD a fileset? If so, its easy with something like: RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name') Simon On 12/07/2018, 07:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of zacekm at img.cas.cz" wrote: Hello, it is possible to create file placement policy for new files in one directory? I need something like this --> All new files created in directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". Thanks. Best regards, Michal _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss [attachment "smime.p7s" deleted by Uwe Falke/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From heiner.billich at psi.ch Thu Jul 12 14:30:43 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 12 Jul 2018 13:30:43 +0000 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> Message-ID: <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> Hello Sven, Thank you. I did enable numaMemorInterleave but the issues stays. In the meantime I switched to version 5.0.0-2 just to see if it?s version dependent ? it?s not. All gpfs filesystems are unmounted when this happens. At shutdown I often need to do a hard reset to force a reboot ? o.k., I never waited more than 5 minutes once I saw a hang, maybe it would recover after some more time. ?rmmod mmfs26? doesn?t hang all the times, maybe at every other shutdown or mmstartup/mmshutdown cycle. While rmmod hangs the system seems slow, command like ?ps -efH? or ?history? take a long time and some mm commands just block, a few times the system gets completely inaccessible. I?ll reinstall the systems and move back to 4.2.3-8 and see if this is a stable configuration to start from an to rule out any hardware/BIOS issues. I append output from numactl -H below. Cheers, Heiner Test with 5.0.0-2 [root at xbl-ces-2 ~]# numactl -H available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 node 0 size: 130942 MB node 0 free: 60295 MB node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 node 1 size: 131072 MB node 1 free: 60042 MB node distances: node 0 1 0: 10 21 1: 21 10 [root at xbl-ces-2 ~]# mmdiag --config | grep numaM ! numaMemoryInterleave yes # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-693.17.1.el7.x86_64 root=/dev/mapper/vg_root-lv_root ro crashkernel=auto rd.lvm.lv=vg_root/lv_root console=tty0 console=ttyS0,115200 nosmap Example output of ps -efH during mmshutdown when rmmod did hang (last line) This is with 5.0.0-2. As I see all gpfs processe already terminated, just root 1 0 0 14:30 ? 00:00:10 /usr/lib/systemd/systemd --switched-root --system --deserialize 21 root 1035 1 0 14:30 ? 00:00:02 /usr/lib/systemd/systemd-journald root 1055 1 0 14:30 ? 00:00:00 /usr/sbin/lvmetad -f root 1072 1 0 14:30 ? 00:00:11 /usr/lib/systemd/systemd-udevd root 1478 1 0 14:31 ? 00:00:00 /usr/sbin/sssd -i -f root 1484 1478 0 14:31 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain D.PSI.CH --uid 0 --gid 0 --debug-to-files root 1486 1478 0 14:31 ? 00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files root 1487 1478 0 14:31 ? 00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files root 1479 1 0 14:31 ? 00:00:00 /usr/sbin/rasdaemon -f -r root 1482 1 0 14:31 ? 00:00:04 /usr/sbin/irqbalance --foreground dbus 1483 1 0 14:31 ? 00:00:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation root 1496 1 0 14:31 ? 00:00:00 /usr/sbin/smartd -n -q never root 1498 1 0 14:31 ? 00:00:00 /usr/sbin/gssproxy -D nscd 1507 1 0 14:31 ? 00:00:01 /usr/sbin/nscd nrpe 1526 1 0 14:31 ? 00:00:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d root 1531 1 0 14:31 ? 00:00:00 /usr/lib/systemd/systemd-logind root 1533 1 0 14:31 ? 00:00:00 /usr/sbin/rpc.gssd root 1803 1 0 14:31 ttyS0 00:00:00 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220 root 1804 1 0 14:31 tty1 00:00:00 /sbin/agetty --noclear tty1 linux root 2405 1 0 14:32 ? 00:00:00 /sbin/dhclient -q -cf /etc/dhcp/dhclient-ib0.conf -lf /var/lib/dhclient/dhclient--ib0.l root 2461 1 0 14:32 ? 00:00:00 /usr/sbin/sshd -D root 11561 2461 0 14:35 ? 00:00:00 sshd: root at pts/0 root 11565 11561 0 14:35 pts/0 00:00:00 -bash root 16024 11565 0 14:50 pts/0 00:00:05 ps -efH root 11609 2461 0 14:35 ? 00:00:00 sshd: root at pts/1 root 11644 11609 0 14:35 pts/1 00:00:00 -bash root 2718 1 0 14:32 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 0 no root 2758 1 0 14:32 ? 00:00:00 /usr/libexec/postfix/master -w postfix 2785 2758 0 14:32 ? 00:00:00 pickup -l -t unix -u postfix 2786 2758 0 14:32 ? 00:00:00 qmgr -l -t unix -u root 3174 1 0 14:32 ? 00:00:00 /usr/sbin/crond -n ntp 3179 1 0 14:32 ? 00:00:00 /usr/sbin/ntpd -u ntp:ntp -g root 3915 1 3 14:32 ? 00:00:33 python /usr/lpp/mmfs/bin/mmsysmon.py root 13618 1 0 14:36 ? 00:00:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 8192 yes no root 15936 1 0 14:49 pts/1 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs root 15992 15936 0 14:49 pts/1 00:00:00 /sbin/rmmod mmfs26 -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Wednesday 11 July 2018 at 15:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown Hi, what does numactl -H report ? also check if this is set to yes : root at fab3a:~# mmlsconfig numaMemoryInterleave numaMemoryInterleave yes Sven On Wed, Jul 11, 2018 at 6:40 AM Billich Heinrich Rainer (PSI) > wrote: Hello, I have two nodes which hang on ?mmshutdown?, in detail the command ?/sbin/rmmod mmfs26? hangs. I get kernel messages which I append below. I wonder if this looks familiar to somebody? Is it a known bug? I can avoid the issue if I reduce pagepool from 128G to 64G. Running ?systemctl stop gpfs? shows the same issue. It forcefully terminates after a while, but ?rmmod? stays stuck. Two functions cxiReleaseAndForgetPages and put_page seem to be involved, the first part of gpfs, the second a kernel call. The servers have 256G memory and 72 (virtual) cores each. I run 5.0.1-1 on RHEL7.4 with kernel 3.10.0-693.17.1.el7.x86_64. I can try to switch back to 5.0.0 Thank you & kind regards, Heiner Jul 11 14:12:04 node-1.x.y mmremote[1641]: Unloading module mmfs26 Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The Spectrum Scale service process not running on this node. Normal operation cannot be done Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [I] Event raised: The Spectrum Scale service process is running Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The node is not able to form a quorum with the other available nodes. Jul 11 14:12:38 node-1.x.y sshd[2826]: Connection closed by xxx port 52814 [preauth] Jul 11 14:12:41 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [rmmod:2695] Jul 11 14:12:41 node-1.x.y kernel: Modules linked in: mmfs26(OE-) mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect Jul 11 14:12:41 node-1.x.y kernel: sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] Jul 11 14:12:41 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 Jul 11 14:12:41 node-1.x.y kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Jul 11 14:12:41 node-1.x.y kernel: task: ffff8808c4814f10 ti: ffff881619778000 task.ti: ffff881619778000 Jul 11 14:12:41 node-1.x.y kernel: RIP: 0010:[] [] put_compound_page+0xc3/0x174 Jul 11 14:12:41 node-1.x.y kernel: RSP: 0018:ffff88161977bd50 EFLAGS: 00000246 Jul 11 14:12:41 node-1.x.y kernel: RAX: 0000000000000283 RBX: 00000000fae3d201 RCX: 0000000000000284 Jul 11 14:12:41 node-1.x.y kernel: RDX: 0000000000000283 RSI: 0000000000000246 RDI: ffffea003d478000 Jul 11 14:12:41 node-1.x.y kernel: RBP: ffff88161977bd68 R08: ffff881ffae3d1e0 R09: 0000000180800059 Jul 11 14:12:41 node-1.x.y kernel: R10: 00000000fae3d201 R11: ffffea007feb8f40 R12: 00000000fae3d201 Jul 11 14:12:41 node-1.x.y kernel: R13: ffff88161977bd40 R14: 0000000000000000 R15: ffff88161977bd40 Jul 11 14:12:41 node-1.x.y kernel: FS: 00007f81a1db0740(0000) GS:ffff883ffee80000(0000) knlGS:0000000000000000 Jul 11 14:12:41 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 11 14:12:41 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: 0000000c36b2c000 CR4: 00000000001607e0 Jul 11 14:12:41 node-1.x.y kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 11 14:12:41 node-1.x.y kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 11 14:12:41 node-1.x.y kernel: Call Trace: Jul 11 14:12:41 node-1.x.y kernel: [] put_page+0x45/0x50 Jul 11 14:12:41 node-1.x.y kernel: [] cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] ? kmem_cache_free+0x1e2/0x200 Jul 11 14:12:41 node-1.x.y kernel: [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] SyS_delete_module+0x19b/0x300 Jul 11 14:12:41 node-1.x.y kernel: [] system_call_fastpath+0x16/0x1b Jul 11 14:12:41 node-1.x.y kernel: Code: d1 00 00 00 4c 89 e7 e8 3a ff ff ff e9 c4 00 00 00 4c 39 e3 74 c1 41 8b 54 24 1c 85 d2 74 b8 8d 4a 01 89 d0 f0 41 0f b1 4c 24 1c <39> c2 74 04 89 c2 eb e8 e8 f3 f0 ae ff 49 89 c5 f0 41 0f ba 2c Jul 11 14:13:23 node-1.x.y systemd[1]: gpfs.service stopping timed out. Terminating. Jul 11 14:13:27 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 21s! [rmmod:2695] Jul 11 14:13:27 node-1.x.y kernel: Modules linked in: mmfs26(OE-) mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler Jul 11 14:13:27 node-1.x.y kernel: INFO: rcu_sched detected stalls on CPUs/tasks: Jul 11 14:13:27 node-1.x.y kernel: { Jul 11 14:13:27 node-1.x.y kernel: 28 Jul 11 14:13:27 node-1.x.y kernel: } Jul 11 14:13:27 node-1.x.y kernel: (detected by 17, t=60002 jiffies, g=267734, c=267733, q=36089) Jul 11 14:13:27 node-1.x.y kernel: Task dump for CPU 28: Jul 11 14:13:27 node-1.x.y kernel: rmmod R Jul 11 14:13:27 node-1.x.y kernel: running task Jul 11 14:13:27 node-1.x.y kernel: 0 2695 2642 0x00000008 Jul 11 14:13:27 node-1.x.y kernel: Call Trace: Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? __free_slab+0xdc/0x200 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? __put_compound_page+0x22/0x22 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? put_page+0x45/0x50 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? SyS_delete_module+0x19b/0x300 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? system_call_fastpath+0x16/0x1b Jul 11 14:13:27 node-1.x.y kernel: acpi_power_meter Jul 11 14:13:27 node-1.x.y kernel: binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] Jul 11 14:13:27 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 Jul 11 14:13:27 node-1.x.y kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Jul 11 14:13:27 node-1.x.y kernel: task: ffff8808c4814f10 ti: ffff881619778000 task.ti: ffff881619778000 Jul 11 14:13:27 node-1.x.y kernel: RIP: 0010:[] [] __put_compound_page+0x22/0x22 Jul 11 14:13:27 node-1.x.y kernel: RSP: 0018:ffff88161977bd70 EFLAGS: 00000282 Jul 11 14:13:27 node-1.x.y kernel: RAX: 002fffff00008010 RBX: 0000000000000135 RCX: 00000000000001c1 Jul 11 14:13:27 node-1.x.y kernel: RDX: ffff8814adbbf000 RSI: 0000000000000246 RDI: ffffea00650e7040 Jul 11 14:13:27 node-1.x.y kernel: RBP: ffff88161977bd78 R08: ffff881ffae3df60 R09: 0000000180800052 Jul 11 14:13:27 node-1.x.y kernel: R10: 00000000fae3db01 R11: ffffea007feb8f40 R12: ffff881ffae3df60 Jul 11 14:13:27 node-1.x.y kernel: R13: 0000000180800052 R14: 00000000fae3db01 R15: ffffea007feb8f40 Jul 11 14:13:27 node-1.x.y kernel: FS: 00007f81a1db0740(0000) GS:ffff883ffee80000(0000) knlGS:0000000000000000 Jul 11 14:13:27 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 11 14:13:27 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: 0000000c36b2c000 CR4: 00000000001607e0 Jul 11 14:13:27 node-1.x.y kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 11 14:13:27 node-1.x.y kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 11 14:13:27 node-1.x.y kernel: Call Trace: Jul 11 14:13:27 node-1.x.y kernel: [] ? put_page+0x45/0x50 Jul 11 14:13:27 node-1.x.y kernel: [] cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] SyS_delete_module+0x19b/0x300 Jul 11 14:13:27 node-1.x.y kernel: [] system_call_fastpath+0x16/0x1b Jul 11 14:13:27 node-1.x.y kernel: Code: c0 0f 95 c0 0f b6 c0 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 8b 07 48 89 fb a8 20 74 05 e8 0c f8 ae ff 48 89 df ff 53 60 5b 5d c3 <0f> 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 07 48 89 fb f6 -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Jul 12 14:40:15 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 12 Jul 2018 06:40:15 -0700 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> Message-ID: if that happens it would be interesting what top reports start top in a large resolution window (like 330x80) , press shift-H , this will break it down per Thread, also press 1 to have a list of each cpu individually and see if you can either spot one core on the top list with 0% idle or on the thread list on the bottom if any of the threads run at 100% core speed. attached is a screenshot which columns to look at , this system is idle, so nothing to see, just to show you where to look does this machine by any chance has either large maxfilestochache or is a token server ? [image: image.png] sven On Thu, Jul 12, 2018 at 6:30 AM Billich Heinrich Rainer (PSI) < heiner.billich at psi.ch> wrote: > Hello Sven, > > > > Thank you. I did enable numaMemorInterleave but the issues stays. > > > > In the meantime I switched to version 5.0.0-2 just to see if it?s version > dependent ? it?s not. All gpfs filesystems are unmounted when this happens. > > > > At shutdown I often need to do a hard reset to force a reboot ? o.k., I > never waited more than 5 minutes once I saw a hang, maybe it would recover > after some more time. > > > > ?rmmod mmfs26? doesn?t hang all the times, maybe at every other shutdown > or mmstartup/mmshutdown cycle. While rmmod hangs the system seems slow, > command like ?ps -efH? or ?history? take a long time and some mm commands > just block, a few times the system gets completely inaccessible. > > > > I?ll reinstall the systems and move back to 4.2.3-8 and see if this is a > stable configuration to start from an to rule out any hardware/BIOS issues. > > > > I append output from numactl -H below. > > > > Cheers, > > > > Heiner > > > > Test with 5.0.0-2 > > > > [root at xbl-ces-2 ~]# numactl -H > > available: 2 nodes (0-1) > > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 37 38 39 40 41 > 42 43 44 45 46 47 48 49 50 51 52 53 > > node 0 size: 130942 MB > > node 0 free: 60295 MB > > node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 54 55 > 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 > > node 1 size: 131072 MB > > node 1 free: 60042 MB > > node distances: > > node 0 1 > > 0: 10 21 > > 1: 21 10 > > > > [root at xbl-ces-2 ~]# mmdiag --config | grep numaM > > ! numaMemoryInterleave yes > > > > # cat /proc/cmdline > > BOOT_IMAGE=/vmlinuz-3.10.0-693.17.1.el7.x86_64 > root=/dev/mapper/vg_root-lv_root ro crashkernel=auto rd.lvm.lv=vg_root/lv_root > console=tty0 console=ttyS0,115200 nosmap > > > > > > Example output of ps -efH during mmshutdown when rmmod did hang (last > line) This is with 5.0.0-2. As I see all gpfs processe already terminated, > just > > > > root 1 0 0 14:30 ? 00:00:10 /usr/lib/systemd/systemd > --switched-root --system --deserialize 21 > > root 1035 1 0 14:30 ? 00:00:02 > /usr/lib/systemd/systemd-journald > > root 1055 1 0 14:30 ? 00:00:00 /usr/sbin/lvmetad -f > > root 1072 1 0 14:30 ? 00:00:11 > /usr/lib/systemd/systemd-udevd > > root 1478 1 0 14:31 ? 00:00:00 /usr/sbin/sssd -i -f > > root 1484 1478 0 14:31 ? 00:00:00 > /usr/libexec/sssd/sssd_be --domain D.PSI.CH --uid 0 --gid 0 > --debug-to-files > > root 1486 1478 0 14:31 ? 00:00:00 > /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files > > root 1487 1478 0 14:31 ? 00:00:00 > /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files > > root 1479 1 0 14:31 ? 00:00:00 /usr/sbin/rasdaemon -f -r > > root 1482 1 0 14:31 ? 00:00:04 /usr/sbin/irqbalance > --foreground > > dbus 1483 1 0 14:31 ? 00:00:00 /bin/dbus-daemon > --system --address=systemd: --nofork --nopidfile --systemd-activation > > root 1496 1 0 14:31 ? 00:00:00 /usr/sbin/smartd -n -q > never > > root 1498 1 0 14:31 ? 00:00:00 /usr/sbin/gssproxy -D > > nscd 1507 1 0 14:31 ? 00:00:01 /usr/sbin/nscd > > nrpe 1526 1 0 14:31 ? 00:00:00 /usr/sbin/nrpe -c > /etc/nagios/nrpe.cfg -d > > root 1531 1 0 14:31 ? 00:00:00 > /usr/lib/systemd/systemd-logind > > root 1533 1 0 14:31 ? 00:00:00 /usr/sbin/rpc.gssd > > root 1803 1 0 14:31 ttyS0 00:00:00 /sbin/agetty --keep-baud > 115200 38400 9600 ttyS0 vt220 > > root 1804 1 0 14:31 tty1 00:00:00 /sbin/agetty --noclear > tty1 linux > > root 2405 1 0 14:32 ? 00:00:00 /sbin/dhclient -q -cf > /etc/dhcp/dhclient-ib0.conf -lf /var/lib/dhclient/dhclient--ib0.l > > root 2461 1 0 14:32 ? 00:00:00 /usr/sbin/sshd -D > > root 11561 2461 0 14:35 ? 00:00:00 sshd: root at pts/0 > > root 11565 11561 0 14:35 pts/0 00:00:00 -bash > > root 16024 11565 0 14:50 pts/0 00:00:05 ps -efH > > root 11609 2461 0 14:35 ? 00:00:00 sshd: root at pts/1 > > root 11644 11609 0 14:35 pts/1 00:00:00 -bash > > root 2718 1 0 14:32 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh > /usr/lpp/mmfs/bin/mmccrmonitor 15 0 no > > root 2758 1 0 14:32 ? 00:00:00 > /usr/libexec/postfix/master -w > > postfix 2785 2758 0 14:32 ? 00:00:00 pickup -l -t unix -u > > postfix 2786 2758 0 14:32 ? 00:00:00 qmgr -l -t unix -u > > root 3174 1 0 14:32 ? 00:00:00 /usr/sbin/crond -n > > ntp 3179 1 0 14:32 ? 00:00:00 /usr/sbin/ntpd -u > ntp:ntp -g > > root 3915 1 3 14:32 ? 00:00:33 python > /usr/lpp/mmfs/bin/mmsysmon.py > > root 13618 1 0 14:36 ? 00:00:00 > /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 8192 yes > no > > root 15936 1 0 14:49 pts/1 00:00:00 /usr/lpp/mmfs/bin/mmksh > /usr/lpp/mmfs/bin/runmmfs > > root 15992 15936 0 14:49 pts/1 00:00:00 /sbin/rmmod mmfs26 > > > > -- > > Paul Scherrer Institut > > Science IT > > Heiner Billich > > WHGA 106 > > CH 5232 Villigen PSI > > 056 310 36 02 > > https://www.psi.ch > > > > > > *From: * on behalf of Sven > Oehme > *Reply-To: *gpfsug main discussion list > *Date: *Wednesday 11 July 2018 at 15:47 > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown > > > > Hi, > > > > what does numactl -H report ? > > > > also check if this is set to yes : > > > > root at fab3a:~# mmlsconfig numaMemoryInterleave > > numaMemoryInterleave yes > > > > Sven > > > > On Wed, Jul 11, 2018 at 6:40 AM Billich Heinrich Rainer (PSI) < > heiner.billich at psi.ch> wrote: > > Hello, > > > > I have two nodes which hang on ?mmshutdown?, in detail the command > ?/sbin/rmmod mmfs26? hangs. I get kernel messages which I append below. I > wonder if this looks familiar to somebody? Is it a known bug? I can avoid > the issue if I reduce pagepool from 128G to 64G. > > > > Running ?systemctl stop gpfs? shows the same issue. It forcefully > terminates after a while, but ?rmmod? stays stuck. > > > > Two functions cxiReleaseAndForgetPages and put_page seem to be involved, > the first part of gpfs, the second a kernel call. > > > > The servers have 256G memory and 72 (virtual) cores each. > > I run 5.0.1-1 on RHEL7.4 with kernel 3.10.0-693.17.1.el7.x86_64. > > > > I can try to switch back to 5.0.0 > > > > Thank you & kind regards, > > > > Heiner > > > > > > > > Jul 11 14:12:04 node-1.x.y mmremote[1641]: Unloading module mmfs26 > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The Spectrum > Scale service process not running on this node. Normal operation cannot be > done > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [I] Event raised: The Spectrum > Scale service process is running > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The node is > not able to form a quorum with the other available nodes. > > Jul 11 14:12:38 node-1.x.y sshd[2826]: Connection closed by xxx port 52814 > [preauth] > > > > Jul 11 14:12:41 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 > stuck for 23s! [rmmod:2695] > > > > Jul 11 14:12:41 node-1.x.y kernel: Modules linked in: mmfs26(OE-) > mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) > rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) > mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) > mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp > coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt > pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler > acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc > ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 > i2c_algo_bit drm_kms_helper syscopyarea sysfillrect > > Jul 11 14:12:41 node-1.x.y kernel: sysimgblt fb_sys_fops ttm ixgbe > mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp > crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca > dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] > > Jul 11 14:12:41 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: > G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > Jul 11 14:12:41 node-1.x.y kernel: Hardware name: HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Jul 11 14:12:41 node-1.x.y kernel: task: ffff8808c4814f10 ti: > ffff881619778000 task.ti: ffff881619778000 > > Jul 11 14:12:41 node-1.x.y kernel: RIP: 0010:[] > [] put_compound_page+0xc3/0x174 > > Jul 11 14:12:41 node-1.x.y kernel: RSP: 0018:ffff88161977bd50 EFLAGS: > 00000246 > > Jul 11 14:12:41 node-1.x.y kernel: RAX: 0000000000000283 RBX: > 00000000fae3d201 RCX: 0000000000000284 > > Jul 11 14:12:41 node-1.x.y kernel: RDX: 0000000000000283 RSI: > 0000000000000246 RDI: ffffea003d478000 > > Jul 11 14:12:41 node-1.x.y kernel: RBP: ffff88161977bd68 R08: > ffff881ffae3d1e0 R09: 0000000180800059 > > Jul 11 14:12:41 node-1.x.y kernel: R10: 00000000fae3d201 R11: > ffffea007feb8f40 R12: 00000000fae3d201 > > Jul 11 14:12:41 node-1.x.y kernel: R13: ffff88161977bd40 R14: > 0000000000000000 R15: ffff88161977bd40 > > Jul 11 14:12:41 node-1.x.y kernel: FS: 00007f81a1db0740(0000) > GS:ffff883ffee80000(0000) knlGS:0000000000000000 > > Jul 11 14:12:41 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Jul 11 14:12:41 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: > 0000000c36b2c000 CR4: 00000000001607e0 > > Jul 11 14:12:41 node-1.x.y kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Jul 11 14:12:41 node-1.x.y kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > > > > Jul 11 14:12:41 node-1.x.y kernel: Call Trace: > > Jul 11 14:12:41 node-1.x.y kernel: [] put_page+0x45/0x50 > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] ? > kmem_cache_free+0x1e2/0x200 > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] mmfs+0xc85/0xca0 > [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] > SyS_delete_module+0x19b/0x300 > > Jul 11 14:12:41 node-1.x.y kernel: [] > system_call_fastpath+0x16/0x1b > > Jul 11 14:12:41 node-1.x.y kernel: Code: d1 00 00 00 4c 89 e7 e8 3a ff ff > ff e9 c4 00 00 00 4c 39 e3 74 c1 41 8b 54 24 1c 85 d2 74 b8 8d 4a 01 89 d0 > f0 41 0f b1 4c 24 1c <39> c2 74 04 89 c2 eb e8 e8 f3 f0 ae ff 49 89 c5 f0 > 41 0f ba 2c > > > > Jul 11 14:13:23 node-1.x.y systemd[1]: gpfs.service stopping timed out. > Terminating. > > > > Jul 11 14:13:27 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 > stuck for 21s! [rmmod:2695] > > > > Jul 11 14:13:27 node-1.x.y kernel: Modules linked in: mmfs26(OE-) > mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) > rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) > mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) > mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp > coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt > pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler > > Jul 11 14:13:27 node-1.x.y kernel: INFO: rcu_sched detected stalls on > CPUs/tasks: > > Jul 11 14:13:27 node-1.x.y kernel: { > > Jul 11 14:13:27 node-1.x.y kernel: 28 > > Jul 11 14:13:27 node-1.x.y kernel: } > > Jul 11 14:13:27 node-1.x.y kernel: (detected by 17, t=60002 jiffies, > g=267734, c=267733, q=36089) > > Jul 11 14:13:27 node-1.x.y kernel: Task dump for CPU 28: > > Jul 11 14:13:27 node-1.x.y kernel: rmmod R > > Jul 11 14:13:27 node-1.x.y kernel: running task > > Jul 11 14:13:27 node-1.x.y kernel: 0 2695 2642 0x00000008 > > Jul 11 14:13:27 node-1.x.y kernel: Call Trace: > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > __free_slab+0xdc/0x200 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > __put_compound_page+0x22/0x22 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > put_page+0x45/0x50 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > mmfs+0xc85/0xca0 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > SyS_delete_module+0x19b/0x300 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > system_call_fastpath+0x16/0x1b > > Jul 11 14:13:27 node-1.x.y kernel: acpi_power_meter > > Jul 11 14:13:27 node-1.x.y kernel: binfmt_misc nfsd auth_rpcgss nfs_acl > lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif > crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul > mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa > pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log > dm_mod [last unloaded: tracedev] > > Jul 11 14:13:27 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: > G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > Jul 11 14:13:27 node-1.x.y kernel: Hardware name: HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Jul 11 14:13:27 node-1.x.y kernel: task: ffff8808c4814f10 ti: > ffff881619778000 task.ti: ffff881619778000 > > Jul 11 14:13:27 node-1.x.y kernel: RIP: 0010:[] > [] __put_compound_page+0x22/0x22 > > Jul 11 14:13:27 node-1.x.y kernel: RSP: 0018:ffff88161977bd70 EFLAGS: > 00000282 > > Jul 11 14:13:27 node-1.x.y kernel: RAX: 002fffff00008010 RBX: > 0000000000000135 RCX: 00000000000001c1 > > Jul 11 14:13:27 node-1.x.y kernel: RDX: ffff8814adbbf000 RSI: > 0000000000000246 RDI: ffffea00650e7040 > > Jul 11 14:13:27 node-1.x.y kernel: RBP: ffff88161977bd78 R08: > ffff881ffae3df60 R09: 0000000180800052 > > Jul 11 14:13:27 node-1.x.y kernel: R10: 00000000fae3db01 R11: > ffffea007feb8f40 R12: ffff881ffae3df60 > > Jul 11 14:13:27 node-1.x.y kernel: R13: 0000000180800052 R14: > 00000000fae3db01 R15: ffffea007feb8f40 > > Jul 11 14:13:27 node-1.x.y kernel: FS: 00007f81a1db0740(0000) > GS:ffff883ffee80000(0000) knlGS:0000000000000000 > > Jul 11 14:13:27 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Jul 11 14:13:27 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: > 0000000c36b2c000 CR4: 00000000001607e0 > > Jul 11 14:13:27 node-1.x.y kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Jul 11 14:13:27 node-1.x.y kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > > Jul 11 14:13:27 node-1.x.y kernel: Call Trace: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > put_page+0x45/0x50 > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] mmfs+0xc85/0xca0 > [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] > SyS_delete_module+0x19b/0x300 > > Jul 11 14:13:27 node-1.x.y kernel: [] > system_call_fastpath+0x16/0x1b > > Jul 11 14:13:27 node-1.x.y kernel: Code: c0 0f 95 c0 0f b6 c0 5d c3 0f 1f > 44 00 00 55 48 89 e5 53 48 8b 07 48 89 fb a8 20 74 05 e8 0c f8 ae ff 48 89 > df ff 53 60 5b 5d c3 <0f> 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 07 > 48 89 fb f6 > > > > -- > > Paul Scherrer Institut > > Science IT > > Heiner Billich > > WHGA 106 > > CH 5232 Villigen PSI > > 056 310 36 02 > > https://www.psi.ch > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 643176 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Jul 12 15:47:00 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 12 Jul 2018 10:47:00 -0400 Subject: [gpfsug-discuss] File placement rule for new files in directory - PATH_NAME In-Reply-To: References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz><8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk><3cf065b3-383b-d1ff-1a33-3cc4b5845274@img.cas.cz> Message-ID: Why no path name in SET POOL rule? Maybe more than one reason, but consider, that in Unix, the API has the concept of "current directory" and "create a file in the current directory" AND another process or thread may at any time rename (mv!) any directory... So even it you "think" you know the name of the directory in which you are creating a file, you really don't know for sure! So, you may ask, how does the command /bin/pwd work? It follows the parent inode field of each inode, searches the parent for a matching inode, stashes the name in a buffer... When it reaches the root, it prints out the apparent path it found to the root... Which could be wrong by the time it reaches the root! For example: [root@~/gpfs-git]$mkdir -p /tmp/a/b/c/d [root@~/gpfs-git]$cd /tmp/a/b/c/d [root at .../c/d]$/bin/pwd /tmp/a/b/c/d [root at .../c/d]$pwd /tmp/a/b/c/d [root at .../c/d]$mv /tmp/a/b /tmp/a/b2 [root at .../c/d]$pwd /tmp/a/b/c/d # Bash still "thinks" it is in /tmp/a/b/c/d [root at .../c/d]$/bin/pwd /tmp/a/b2/c/d # But /bin/pwd knows better -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Thu Jul 12 16:21:50 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 12 Jul 2018 15:21:50 +0000 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> Message-ID: <68AA932A-5A53-4EA8-879E-A843783DF0F4@psi.ch> Hello Sven, The machine has maxFilesToCache 204800 (2M) it will become a CES node, hence the higher than default value. It?s just a 3 node cluster with remote cluster mount and no activity (yet). But all three nodes are listed as token server by ?mmdiag ?tokenmgr?. Top showed 100% idle on core 55. This matches the kernel messages about rmmod being stuck on core 55. I didn?t see a dominating thread/process, but many kernel threads showed 30-40% CPU, in sum that used about 50% of all cpu available. This time mmshutdown did return and left the module loaded, next mmstartup tried to remove the ?old? module and got stuck :-( I append two links to screenshots Thank you, Heiner https://pasteboard.co/Hu86DKf.png https://pasteboard.co/Hu86rg4.png If the links don?t work I can post the images to the list. Kernel messages: [ 857.791050] CPU: 55 PID: 16429 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 [ 857.842265] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 [ 857.884938] task: ffff883ffafe8fd0 ti: ffff88342af30000 task.ti: ffff88342af30000 [ 857.924120] RIP: 0010:[] [] compound_unlock_irqrestore+0xe/0x20 [ 857.970708] RSP: 0018:ffff88342af33d38 EFLAGS: 00000246 [ 857.999742] RAX: 0000000000000000 RBX: ffff88207ffda068 RCX: 00000000000000e5 [ 858.037165] RDX: 0000000000000246 RSI: 0000000000000246 RDI: 0000000000000246 [ 858.074416] RBP: ffff88342af33d38 R08: 0000000000000000 R09: 0000000000000000 [ 858.111519] R10: ffff88207ffcfac0 R11: ffffea00fff40280 R12: 0000000000000200 [ 858.148421] R13: 00000001fff40280 R14: ffffffff8118cd84 R15: ffff88342af33ce8 [ 858.185845] FS: 00007fc797d1e740(0000) GS:ffff883fff0c0000(0000) knlGS:0000000000000000 [ 858.227062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 858.257819] CR2: 00000000004116d0 CR3: 0000003fc2ec0000 CR4: 00000000001607e0 [ 858.295143] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 858.332145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 858.369097] Call Trace: [ 858.384829] [] put_compound_page+0x149/0x174 [ 858.416176] [] put_page+0x45/0x50 [ 858.443185] [] cxiReleaseAndForgetPages+0xda/0x220 [mmfslinux] [ 858.481751] [] ? cxiDeallocPageList+0xbd/0x110 [mmfslinux] [ 858.518206] [] cxiDeallocPageList+0x45/0x110 [mmfslinux] [ 858.554438] [] ? _raw_spin_lock+0x10/0x30 [ 858.585522] [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] [ 858.622670] [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] [ 858.659246] [] mmfs+0xc85/0xca0 [mmfs26] [ 858.689379] [] gpfs_clean+0x26/0x30 [mmfslinux] [ 858.722330] [] cleanup_module+0x25/0x30 [mmfs26] [ 858.755431] [] SyS_delete_module+0x19b/0x300 [ 858.786882] [] system_call_fastpath+0x16/0x1b [ 858.818776] Code: 89 ca 44 89 c1 4c 8d 43 10 e8 6f 2b ff ff 89 c2 48 89 13 5b 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 f0 80 67 03 fe 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 [ 859.068528] hrtimer: interrupt took 2877171 ns [ 870.517924] INFO: rcu_sched self-detected stall on CPU { 55} (t=240003 jiffies g=18437 c=18436 q=194992) [ 870.577882] Task dump for CPU 55: [ 870.602837] rmmod R running task 0 16429 16374 0x00000008 [ 870.645206] Call Trace: [ 870.666388] [] sched_show_task+0xa8/0x110 [ 870.704271] [] dump_cpu_task+0x39/0x70 [ 870.738421] [] rcu_dump_cpu_stacks+0x90/0xd0 [ 870.775339] [] rcu_check_callbacks+0x442/0x730 [ 870.812353] [] ? tick_sched_do_timer+0x50/0x50 [ 870.848875] [] update_process_times+0x46/0x80 [ 870.884847] [] tick_sched_handle+0x30/0x70 [ 870.919740] [] tick_sched_timer+0x39/0x80 [ 870.953660] [] __hrtimer_run_queues+0xd4/0x260 [ 870.989276] [] hrtimer_interrupt+0xaf/0x1d0 [ 871.023481] [] local_apic_timer_interrupt+0x35/0x60 [ 871.061233] [] smp_apic_timer_interrupt+0x3d/0x50 [ 871.097838] [] apic_timer_interrupt+0x232/0x240 [ 871.133232] [] ? put_page_testzero+0x8/0x15 [ 871.170089] [] put_compound_page+0x151/0x174 [ 871.204221] [] put_page+0x45/0x50 [ 871.234554] [] cxiReleaseAndForgetPages+0xda/0x220 [mmfslinux] [ 871.275763] [] ? cxiDeallocPageList+0xbd/0x110 [mmfslinux] [ 871.316987] [] cxiDeallocPageList+0x45/0x110 [mmfslinux] [ 871.356886] [] ? _raw_spin_lock+0x10/0x30 [ 871.389455] [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] [ 871.429784] [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] [ 871.468753] [] mmfs+0xc85/0xca0 [mmfs26] [ 871.501196] [] gpfs_clean+0x26/0x30 [mmfslinux] [ 871.536562] [] cleanup_module+0x25/0x30 [mmfs26] [ 871.572110] [] SyS_delete_module+0x19b/0x300 [ 871.606048] [] system_call_fastpath+0x16/0x1b -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Thursday 12 July 2018 at 15:42 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown if that happens it would be interesting what top reports start top in a large resolution window (like 330x80) , press shift-H , this will break it down per Thread, also press 1 to have a list of each cpu individually and see if you can either spot one core on the top list with 0% idle or on the thread list on the bottom if any of the threads run at 100% core speed. attached is a screenshot which columns to look at , this system is idle, so nothing to see, just to show you where to look does this machine by any chance has either large maxfilestochache or is a token server ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Jul 12 16:30:43 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 12 Jul 2018 08:30:43 -0700 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: <68AA932A-5A53-4EA8-879E-A843783DF0F4@psi.ch> References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> <68AA932A-5A53-4EA8-879E-A843783DF0F4@psi.ch> Message-ID: Hi, the problem is the cleanup of the tokens and/or the openfile objects. i suggest you open a defect for this. sven On Thu, Jul 12, 2018 at 8:22 AM Billich Heinrich Rainer (PSI) < heiner.billich at psi.ch> wrote: > > > > > Hello Sven, > > > > The machine has > > > > maxFilesToCache 204800 (2M) > > > > it will become a CES node, hence the higher than default value. It?s just > a 3 node cluster with remote cluster mount and no activity (yet). But all > three nodes are listed as token server by ?mmdiag ?tokenmgr?. > > > > Top showed 100% idle on core 55. This matches the kernel messages about > rmmod being stuck on core 55. > > I didn?t see a dominating thread/process, but many kernel threads showed > 30-40% CPU, in sum that used about 50% of all cpu available. > > > > This time mmshutdown did return and left the module loaded, next mmstartup > tried to remove the ?old? module and got stuck :-( > > > > I append two links to screenshots > > > > Thank you, > > > > Heiner > > > > https://pasteboard.co/Hu86DKf.png > > https://pasteboard.co/Hu86rg4.png > > > > If the links don?t work I can post the images to the list. > > > > Kernel messages: > > > > [ 857.791050] CPU: 55 PID: 16429 Comm: rmmod Tainted: G W OEL > ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > [ 857.842265] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, > BIOS P89 01/22/2018 > > [ 857.884938] task: ffff883ffafe8fd0 ti: ffff88342af30000 task.ti: > ffff88342af30000 > > [ 857.924120] RIP: 0010:[] [] > compound_unlock_irqrestore+0xe/0x20 > > [ 857.970708] RSP: 0018:ffff88342af33d38 EFLAGS: 00000246 > > [ 857.999742] RAX: 0000000000000000 RBX: ffff88207ffda068 RCX: > 00000000000000e5 > > [ 858.037165] RDX: 0000000000000246 RSI: 0000000000000246 RDI: > 0000000000000246 > > [ 858.074416] RBP: ffff88342af33d38 R08: 0000000000000000 R09: > 0000000000000000 > > [ 858.111519] R10: ffff88207ffcfac0 R11: ffffea00fff40280 R12: > 0000000000000200 > > [ 858.148421] R13: 00000001fff40280 R14: ffffffff8118cd84 R15: > ffff88342af33ce8 > > [ 858.185845] FS: 00007fc797d1e740(0000) GS:ffff883fff0c0000(0000) > knlGS:0000000000000000 > > [ 858.227062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 858.257819] CR2: 00000000004116d0 CR3: 0000003fc2ec0000 CR4: > 00000000001607e0 > > [ 858.295143] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > > [ 858.332145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > > [ 858.369097] Call Trace: > > [ 858.384829] [] put_compound_page+0x149/0x174 > > [ 858.416176] [] put_page+0x45/0x50 > > [ 858.443185] [] cxiReleaseAndForgetPages+0xda/0x220 > [mmfslinux] > > [ 858.481751] [] ? cxiDeallocPageList+0xbd/0x110 > [mmfslinux] > > [ 858.518206] [] cxiDeallocPageList+0x45/0x110 > [mmfslinux] > > [ 858.554438] [] ? _raw_spin_lock+0x10/0x30 > > [ 858.585522] [] cxiFreeSharedMemory+0x12a/0x130 > [mmfslinux] > > [ 858.622670] [] kxFreeAllSharedMemory+0xe2/0x160 > [mmfs26] > > [ 858.659246] [] mmfs+0xc85/0xca0 [mmfs26] > > [ 858.689379] [] gpfs_clean+0x26/0x30 [mmfslinux] > > [ 858.722330] [] cleanup_module+0x25/0x30 [mmfs26] > > [ 858.755431] [] SyS_delete_module+0x19b/0x300 > > [ 858.786882] [] system_call_fastpath+0x16/0x1b > > [ 858.818776] Code: 89 ca 44 89 c1 4c 8d 43 10 e8 6f 2b ff ff 89 c2 48 89 > 13 5b 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 f0 80 67 03 fe 48 89 f7 57 9d > <0f> 1f 44 00 00 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 > > [ 859.068528] hrtimer: interrupt took 2877171 ns > > [ 870.517924] INFO: rcu_sched self-detected stall on CPU { 55} (t=240003 > jiffies g=18437 c=18436 q=194992) > > [ 870.577882] Task dump for CPU 55: > > [ 870.602837] rmmod R running task 0 16429 16374 > 0x00000008 > > [ 870.645206] Call Trace: > > [ 870.666388] [] sched_show_task+0xa8/0x110 > > [ 870.704271] [] dump_cpu_task+0x39/0x70 > > [ 870.738421] [] rcu_dump_cpu_stacks+0x90/0xd0 > > [ 870.775339] [] rcu_check_callbacks+0x442/0x730 > > [ 870.812353] [] ? tick_sched_do_timer+0x50/0x50 > > [ 870.848875] [] update_process_times+0x46/0x80 > > [ 870.884847] [] tick_sched_handle+0x30/0x70 > > [ 870.919740] [] tick_sched_timer+0x39/0x80 > > [ 870.953660] [] __hrtimer_run_queues+0xd4/0x260 > > [ 870.989276] [] hrtimer_interrupt+0xaf/0x1d0 > > [ 871.023481] [] local_apic_timer_interrupt+0x35/0x60 > > [ 871.061233] [] smp_apic_timer_interrupt+0x3d/0x50 > > [ 871.097838] [] apic_timer_interrupt+0x232/0x240 > > [ 871.133232] [] ? put_page_testzero+0x8/0x15 > > [ 871.170089] [] put_compound_page+0x151/0x174 > > [ 871.204221] [] put_page+0x45/0x50 > > [ 871.234554] [] cxiReleaseAndForgetPages+0xda/0x220 > [mmfslinux] > > [ 871.275763] [] ? cxiDeallocPageList+0xbd/0x110 > [mmfslinux] > > [ 871.316987] [] cxiDeallocPageList+0x45/0x110 > [mmfslinux] > > [ 871.356886] [] ? _raw_spin_lock+0x10/0x30 > > [ 871.389455] [] cxiFreeSharedMemory+0x12a/0x130 > [mmfslinux] > > [ 871.429784] [] kxFreeAllSharedMemory+0xe2/0x160 > [mmfs26] > > [ 871.468753] [] mmfs+0xc85/0xca0 [mmfs26] > > [ 871.501196] [] gpfs_clean+0x26/0x30 [mmfslinux] > > [ 871.536562] [] cleanup_module+0x25/0x30 [mmfs26] > > [ 871.572110] [] SyS_delete_module+0x19b/0x300 > > [ 871.606048] [] system_call_fastpath+0x16/0x1b > > > > -- > > Paul Scherrer Institut > > Science IT > > Heiner Billich > > WHGA 106 > > CH 5232 Villigen PSI > > 056 310 36 02 > > https://www.psi.ch > > > > > > *From: * on behalf of Sven > Oehme > > > *Reply-To: *gpfsug main discussion list > > *Date: *Thursday 12 July 2018 at 15:42 > > > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown > > > > if that happens it would be interesting what top reports > > > > start top in a large resolution window (like 330x80) , press shift-H , > this will break it down per Thread, also press 1 to have a list of each cpu > individually and see if you can either spot one core on the top list with > 0% idle or on the thread list on the bottom if any of the threads run at > 100% core speed. > > attached is a screenshot which columns to look at , this system is idle, > so nothing to see, just to show you where to look > > > > does this machine by any chance has either large maxfilestochache or is a > token server ? > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Fri Jul 13 11:07:25 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Fri, 13 Jul 2018 10:07:25 +0000 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data Message-ID: <83A6EEB0EC738F459A39439733AE80452672ADC8@MBX114.d.ethz.ch> Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes. I've the following perfmon configuration for the metric-group GPFSNSDDisk: { name = "GPFSNSDDisk" period = 2 restrict = "nsdNodes" }, that, as far as I know sends data to the collector every 2 seconds (correct ?). But how ? does it send what it reads from the counter every two seconds ? or does it aggregated in some way ? or what else ? In the collector node pmcollector, grafana-bridge and grafana-server run. Now I need to understand how to play with the grafana parameters: - Down sample (or Disable downsampling) - Aggregator (following on the same row the metrics). See attached picture 4s.png as reference. In the past I had the period set to 1. And grafana used to display correct data (bytes/s for the metric gpfs_nsdds_bytes_written) with aggregator set to "sum", which AFAIK means "sum all that metrics that match the filter below" (again see the attached picture to see how the filter is set to only collect data from the IO nodes). Today I've changed to "period=2"... and grafana started to display funny data rate (the double, or quad of the real rate). I had to play (almost randomly) with "Aggregator" (from sum to avg, which as fas as I undestand doesn't mean anything in my case... average between the two IO nodes ? or what ?) and "Down sample" (from empty to 2s, and then to 4s) to get back real data rate which is compliant with what I do get with dstat. Can someone kindly explain how to play with these parameters when zimon sensor's period is changed ? Many thanks in advance Regards, Alvise Dorigo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 4s.png Type: image/png Size: 129914 bytes Desc: 4s.png URL: From Kevin.Buterbaugh at Vanderbilt.Edu Sun Jul 15 18:24:43 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Sun, 15 Jul 2018 17:24:43 +0000 Subject: [gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace? Message-ID: Hi All, We are in a partial cluster downtime today to do firmware upgrades on our storage arrays. It is a partial downtime because we have two GPFS filesystems: 1. gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which I?ve unmounted across the cluster because it has data replication set to 1. 2. gpfs22 - 42 TB and which corresponds to /home. It has data replication set to two, so what we?re doing is ?mmchdisk gpfs22 suspend -d ?, then doing the firmware upgrade, and once the array is back we?re doing a ?mmchdisk gpfs22 resume -d ?, followed by ?mmchdisk gpfs22 start -d ?. On the 1st storage array this went very smoothly ? the mmchdisk took about 5 minutes, which is what I would expect. But on the 2nd storage array the mmchdisk appears to either be hung or proceeding at a glacial pace. For more than an hour it?s been stuck at: mmchdisk: Processing continues ... Scanning file system metadata, phase 1 ? There are no waiters of any significance and ?mmdiag ?iohist? doesn?t show any issues either. Any ideas, anyone? Unless I can figure this out I?m hosed for this downtime, as I?ve got 7 more arrays to do after this one! Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sun Jul 15 18:34:45 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Sun, 15 Jul 2018 17:34:45 +0000 Subject: [gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace? In-Reply-To: References: Message-ID: <9B63AEFC-0A19-4FA2-B04B-FCB066B7C9BD@nasa.gov> Hmm...have you dumped waiters across the entire cluster or just on the NSD servers/fs managers? Maybe there?s a slow node out there participating in the suspend effort? Might be worth running some quick tracing on the FS manager to see what it?s up to. On July 15, 2018 at 13:27:54 EDT, Buterbaugh, Kevin L wrote: Hi All, We are in a partial cluster downtime today to do firmware upgrades on our storage arrays. It is a partial downtime because we have two GPFS filesystems: 1. gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which I?ve unmounted across the cluster because it has data replication set to 1. 2. gpfs22 - 42 TB and which corresponds to /home. It has data replication set to two, so what we?re doing is ?mmchdisk gpfs22 suspend -d ?, then doing the firmware upgrade, and once the array is back we?re doing a ?mmchdisk gpfs22 resume -d ?, followed by ?mmchdisk gpfs22 start -d ?. On the 1st storage array this went very smoothly ? the mmchdisk took about 5 minutes, which is what I would expect. But on the 2nd storage array the mmchdisk appears to either be hung or proceeding at a glacial pace. For more than an hour it?s been stuck at: mmchdisk: Processing continues ... Scanning file system metadata, phase 1 ? There are no waiters of any significance and ?mmdiag ?iohist? doesn?t show any issues either. Any ideas, anyone? Unless I can figure this out I?m hosed for this downtime, as I?ve got 7 more arrays to do after this one! Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Sun Jul 15 20:11:26 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Sun, 15 Jul 2018 19:11:26 +0000 Subject: [gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace? In-Reply-To: <9B63AEFC-0A19-4FA2-B04B-FCB066B7C9BD@nasa.gov> References: <9B63AEFC-0A19-4FA2-B04B-FCB066B7C9BD@nasa.gov> Message-ID: <08D6C49B-298F-4DAA-8FF3-BDAA6D9CE8FE@vanderbilt.edu> Hi All, So I had noticed some waiters on my NSD servers that I thought were unrelated to the mmchdisk. However, I decided to try rebooting my NSD servers one at a time (mmshutdown failed!) to clear that up ? and evidently one of them had things hung up because the mmchdisk start completed. Thanks? Kevin On Jul 15, 2018, at 12:34 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] > wrote: Hmm...have you dumped waiters across the entire cluster or just on the NSD servers/fs managers? Maybe there?s a slow node out there participating in the suspend effort? Might be worth running some quick tracing on the FS manager to see what it?s up to. On July 15, 2018 at 13:27:54 EDT, Buterbaugh, Kevin L > wrote: Hi All, We are in a partial cluster downtime today to do firmware upgrades on our storage arrays. It is a partial downtime because we have two GPFS filesystems: 1. gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which I?ve unmounted across the cluster because it has data replication set to 1. 2. gpfs22 - 42 TB and which corresponds to /home. It has data replication set to two, so what we?re doing is ?mmchdisk gpfs22 suspend -d ?, then doing the firmware upgrade, and once the array is back we?re doing a ?mmchdisk gpfs22 resume -d ?, followed by ?mmchdisk gpfs22 start -d ?. On the 1st storage array this went very smoothly ? the mmchdisk took about 5 minutes, which is what I would expect. But on the 2nd storage array the mmchdisk appears to either be hung or proceeding at a glacial pace. For more than an hour it?s been stuck at: mmchdisk: Processing continues ... Scanning file system metadata, phase 1 ? There are no waiters of any significance and ?mmdiag ?iohist? doesn?t show any issues either. Any ideas, anyone? Unless I can figure this out I?m hosed for this downtime, as I?ve got 7 more arrays to do after this one! Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd518db52846a4be34e2208d5ea7a00d7%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636672732087040757&sdata=m77IpWNOlODc%2FzLiYI2qiPo9Azs8qsIdXSY8%2FoC6Nn0%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Peinkofer at lrz.de Thu Jul 19 15:05:39 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Thu, 19 Jul 2018 14:05:39 +0000 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster Message-ID: <03AEBEA7-B319-4DB1-A0D8-4A250038F8E7@lrz.de> Dear GPFS List, does anyone of you know, if it is possible to have multiple file systems in a GPFS Cluster that all are served primary via Ethernet but for which different ?booster? connections to various IB/OPA fabrics exist. For example let?s say in my central Storage/NSD Cluster, I implement two file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is served by NSD-C and NSD-D. Now I have two client Clusters C1 and C2 which have different OPA fabrics. Both Clusters can mount the two file systems via Ethernet, but I now add OPA connections for NSD-A and NSD-B to C1?s fabric and OPA connections for NSD-C and NSD-D to C2?s fabric and just switch on RDMA. As far as I understood, GPFS will use RDMA if it is available between two nodes but switch to Ethernet if RDMA is not available between the two nodes. So given just this, the above scenario could work in principle. But will it work in reality and will it be supported by IBM? Many thanks in advance. Best Regards, Stephan Peinkofer -- Stephan Peinkofer Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen URL: http://www.lrz.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Thu Jul 19 15:23:42 2018 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 19 Jul 2018 10:23:42 -0400 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster In-Reply-To: <03AEBEA7-B319-4DB1-A0D8-4A250038F8E7@lrz.de> References: <03AEBEA7-B319-4DB1-A0D8-4A250038F8E7@lrz.de> Message-ID: Hi Stephan: I think every node in C1 and in C2 have to see every node in the server cluster NSD-[AD]. We have a 10 node server cluster where 2 nodes do nothing but server out nfs. Since these two are apart of the server cluster...client clusters wanting to mount the server cluster via gpfs need to see them. I think both OPA fabfics need to be on all 4 of your server nodes. Eric On Thu, Jul 19, 2018 at 10:05 AM, Peinkofer, Stephan < Stephan.Peinkofer at lrz.de> wrote: > Dear GPFS List, > > does anyone of you know, if it is possible to have multiple file systems > in a GPFS Cluster that all are served primary via Ethernet but for which > different ?booster? connections to various IB/OPA fabrics exist. > > For example let?s say in my central Storage/NSD Cluster, I implement two > file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is > served by NSD-C and NSD-D. > Now I have two client Clusters C1 and C2 which have different OPA fabrics. > Both Clusters can mount the two file systems via Ethernet, but I now add > OPA connections for NSD-A and NSD-B to C1?s fabric and OPA connections for > NSD-C and NSD-D to C2?s fabric and just switch on RDMA. > As far as I understood, GPFS will use RDMA if it is available between two > nodes but switch to Ethernet if RDMA is not available between the two > nodes. So given just this, the above scenario could work in principle. But > will it work in reality and will it be supported by IBM? > > Many thanks in advance. > Best Regards, > Stephan Peinkofer > -- > Stephan Peinkofer > Leibniz Supercomputing Centre > Data and Storage Division > Boltzmannstra?e 1, 85748 Garching b. M?nchen > URL: http://www.lrz.de > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 19 16:42:48 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 19 Jul 2018 15:42:48 +0000 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster Message-ID: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> I think what you want is to use fabric numbers with verbsPorts, e.g. we have two IB fabrics and in the config we do thinks like: [nodeclass1] verbsPorts mlx4_0/1/1 [nodeclass2] verbsPorts mlx5_0/1/3 GPFS recognises the /1 or /3 at the end as a fabric number and knows they are separate and will Ethernet between those nodes instead. Simon From: on behalf of "Stephan.Peinkofer at lrz.de" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 19 July 2018 at 15:13 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster Dear GPFS List, does anyone of you know, if it is possible to have multiple file systems in a GPFS Cluster that all are served primary via Ethernet but for which different ?booster? connections to various IB/OPA fabrics exist. For example let?s say in my central Storage/NSD Cluster, I implement two file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is served by NSD-C and NSD-D. Now I have two client Clusters C1 and C2 which have different OPA fabrics. Both Clusters can mount the two file systems via Ethernet, but I now add OPA connections for NSD-A and NSD-B to C1?s fabric and OPA connections for NSD-C and NSD-D to C2?s fabric and just switch on RDMA. As far as I understood, GPFS will use RDMA if it is available between two nodes but switch to Ethernet if RDMA is not available between the two nodes. So given just this, the above scenario could work in principle. But will it work in reality and will it be supported by IBM? Many thanks in advance. Best Regards, Stephan Peinkofer -- Stephan Peinkofer Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen URL: http://www.lrz.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Jul 19 17:54:22 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 19 Jul 2018 12:54:22 -0400 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster In-Reply-To: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> References: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> Message-ID: To add to the excellent advice others have already provided, I think you have fundamentally 2 choices: - Establish additional OPA connections from NSD-A and NSD-B to cluster C2 and from NSD-C and NSD-D to cluster C1 *or* - Add NSD-A and NSD-B as nsd servers for the NSDs for FS2 and add NSD-C and NSD-D as nsd servers for the NSDs for FS1. (Note: If you're running Scale 5.0 you can change the NSD server list with the FS available and mounted, else you'll need an outage to unmount the FS and change the NSD server list.) It's a matter of what's preferable (aasier, cheaper, etc.)-- adding OPA connections to the NSD servers or adding additional LUN presentations (which may involve SAN connections, of course) to the NSD servers. In our environment we do the latter and it works very well for us. -Aaron On 7/19/18 11:42 AM, Simon Thompson wrote: > I think what you want is to use fabric numbers with verbsPorts, e.g. we > have two IB fabrics and in the config we do thinks like: > > [nodeclass1] > > verbsPorts mlx4_0/1/1 > > [nodeclass2] > > verbsPorts mlx5_0/1/3 > > GPFS recognises the /1 or /3 at the end as a fabric number and knows > they are separate and will Ethernet between those nodes instead. > > Simon > > *From: * on behalf of > "Stephan.Peinkofer at lrz.de" > *Reply-To: *"gpfsug-discuss at spectrumscale.org" > > *Date: *Thursday, 19 July 2018 at 15:13 > *To: *"gpfsug-discuss at spectrumscale.org" > *Subject: *[gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD > Cluster > > Dear GPFS List, > > does anyone of you know, if it is possible to have multiple file systems > in a GPFS Cluster that all are served primary via Ethernet but for which > different ?booster? connections to various IB/OPA fabrics exist. > > For example let?s say in my central Storage/NSD Cluster, I implement two > file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is > served by NSD-C and NSD-D. > > Now I have two client Clusters C1 and C2 which have different OPA > fabrics. Both Clusters can mount the two file systems via Ethernet, but > I now add OPA connections for NSD-A and NSD-B to C1?s fabric and OPA > connections for NSD-C and NSD-D to ?C2?s fabric and just switch on RDMA. > > As far as I understood, GPFS will use RDMA if it is available between > two nodes but switch to Ethernet if RDMA is not available between the > two nodes. So given just this, the above scenario could work in > principle. But will it work in reality and will it be supported by IBM? > > Many thanks in advance. > > Best Regards, > > Stephan Peinkofer > > -- > Stephan Peinkofer > Leibniz Supercomputing Centre > Data and Storage Division > Boltzmannstra?e 1, 85748 Garching b.?M?nchen > URL: http://www.lrz.de > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From valdis.kletnieks at vt.edu Thu Jul 19 22:25:23 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 19 Jul 2018 17:25:23 -0400 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? Message-ID: <25435.1532035523@turing-police.cc.vt.edu> So I'm trying to tidy up things like 'mmhealth' etc. Got most of it fixed, but stuck on one thing.. Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which cleaned out a bunch of other long-past events that were "stuck" as failed / degraded even though they were corrected days/weeks ago - keep this in mind as you read on.... # mmhealth cluster show Component Total Failed Degraded Healthy Other ------------------------------------------------------------------------------------- NODE 10 0 0 10 0 GPFS 10 0 0 10 0 NETWORK 10 0 0 10 0 FILESYSTEM 1 0 1 0 0 DISK 102 0 0 102 0 CES 4 0 0 4 0 GUI 1 0 0 1 0 PERFMON 10 0 0 10 0 THRESHOLD 10 0 0 10 0 Great. One hit for 'degraded' filesystem. # mmhealth node show --unhealthy -N all (skipping all the nodes that show healthy) Node name: arnsd3-vtc.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ----------------------------------------------------------------------------------- FILESYSTEM FAILED 24 days ago pool-data_high_error(archive/system) (...) Node name: arproto2-isb.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ---------------------------------------------------------------------------------- FILESYSTEM DEGRADED 6 days ago pool-data_high_warn(archive/system) mmdf tells me: nsd_isb_01 13103005696 1 No Yes 1747905536 ( 13%) 111667200 ( 1%) nsd_isb_02 13103005696 1 No Yes 1748245504 ( 13%) 111724384 ( 1%) (94 more LUNs all within 0.2% of these for usage - data is striped out pretty well) There's also 6 SSD LUNs for metadata: nsd_isb_flash_01 2956984320 1 Yes No 2116091904 ( 72%) 26996992 ( 1%) (again, evenly striped) So who is remembering that status, and how to clear it? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jul 19 23:23:06 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 19 Jul 2018 22:23:06 +0000 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? In-Reply-To: <25435.1532035523@turing-police.cc.vt.edu> References: <25435.1532035523@turing-police.cc.vt.edu> Message-ID: <2165FB72-BF80-4EE4-908F-0399620C83D6@vanderbilt.edu> Hi Valdis, Is this what you?re looking for (from an IBMer in response to another question a few weeks back)? assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings: # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 # mmhealth thresholds delete MetaDataCapUtil_Rule The rule(s) was(were) deleted successfully # mmhealth thresholds add MetaDataPool_capUtil --errorlevel 95.0 --warnlevel 85.0 --direction high --sensitivity 300 --name MetaDataCapUtil_Rule --groupby gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 95.0 85.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 19, 2018, at 4:25 PM, valdis.kletnieks at vt.edu wrote: So I'm trying to tidy up things like 'mmhealth' etc. Got most of it fixed, but stuck on one thing.. Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which cleaned out a bunch of other long-past events that were "stuck" as failed / degraded even though they were corrected days/weeks ago - keep this in mind as you read on.... # mmhealth cluster show Component Total Failed Degraded Healthy Other ------------------------------------------------------------------------------------- NODE 10 0 0 10 0 GPFS 10 0 0 10 0 NETWORK 10 0 0 10 0 FILESYSTEM 1 0 1 0 0 DISK 102 0 0 102 0 CES 4 0 0 4 0 GUI 1 0 0 1 0 PERFMON 10 0 0 10 0 THRESHOLD 10 0 0 10 0 Great. One hit for 'degraded' filesystem. # mmhealth node show --unhealthy -N all (skipping all the nodes that show healthy) Node name: arnsd3-vtc.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ----------------------------------------------------------------------------------- FILESYSTEM FAILED 24 days ago pool-data_high_error(archive/system) (...) Node name: arproto2-isb.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ---------------------------------------------------------------------------------- FILESYSTEM DEGRADED 6 days ago pool-data_high_warn(archive/system) mmdf tells me: nsd_isb_01 13103005696 1 No Yes 1747905536 ( 13%) 111667200 ( 1%) nsd_isb_02 13103005696 1 No Yes 1748245504 ( 13%) 111724384 ( 1%) (94 more LUNs all within 0.2% of these for usage - data is striped out pretty well) There's also 6 SSD LUNs for metadata: nsd_isb_flash_01 2956984320 1 Yes No 2116091904 ( 72%) 26996992 ( 1%) (again, evenly striped) So who is remembering that status, and how to clear it? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ca2e808fa12e74ed277bc08d5edc51bc3%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636676353194563950&sdata=5biJuM0K0XwEw3BMwbS5epNQhrlig%2FFON7k1V79G%2Fyc%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Peinkofer at lrz.de Fri Jul 20 07:39:24 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 20 Jul 2018 06:39:24 +0000 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster In-Reply-To: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> References: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> Message-ID: <05cf5689138043da8321b728f320834c@lrz.de> Dear Simon and List, thanks. That was exactly I was looking for. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: Thursday, July 19, 2018 5:42 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster I think what you want is to use fabric numbers with verbsPorts, e.g. we have two IB fabrics and in the config we do thinks like: [nodeclass1] verbsPorts mlx4_0/1/1 [nodeclass2] verbsPorts mlx5_0/1/3 GPFS recognises the /1 or /3 at the end as a fabric number and knows they are separate and will Ethernet between those nodes instead. Simon From: on behalf of "Stephan.Peinkofer at lrz.de" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 19 July 2018 at 15:13 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster Dear GPFS List, does anyone of you know, if it is possible to have multiple file systems in a GPFS Cluster that all are served primary via Ethernet but for which different ?booster? connections to various IB/OPA fabrics exist. For example let?s say in my central Storage/NSD Cluster, I implement two file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is served by NSD-C and NSD-D. Now I have two client Clusters C1 and C2 which have different OPA fabrics. Both Clusters can mount the two file systems via Ethernet, but I now add OPA connections for NSD-A and NSD-B to C1?s fabric and OPA connections for NSD-C and NSD-D to C2?s fabric and just switch on RDMA. As far as I understood, GPFS will use RDMA if it is available between two nodes but switch to Ethernet if RDMA is not available between the two nodes. So given just this, the above scenario could work in principle. But will it work in reality and will it be supported by IBM? Many thanks in advance. Best Regards, Stephan Peinkofer -- Stephan Peinkofer Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen URL: http://www.lrz.de LRZ: Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften www.lrz.de Das LRZ ist das Rechenzentrum f?r die M?nchner Universit?ten, die Bayerische Akademie der Wissenschaften sowie nationales Zentrum f?r Hochleistungsrechnen. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Jul 20 09:29:29 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 20 Jul 2018 16:29:29 +0800 Subject: [gpfsug-discuss] mmfsadddisk command interrupted In-Reply-To: References: Message-ID: Hi Damir, Since many GPFS management command got unresponsive and you are running ESS, mail-list maybe not a good way to track this kinds of issue. Could you please raise a ticket to ESS/SpectrumScale to get help from IBM Service team? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Damir Krstic To: gpfsug main discussion list Date: 06/23/2018 03:04 AM Subject: [gpfsug-discuss] mmfsadddisk command interrupted Sent by: gpfsug-discuss-bounces at spectrumscale.org We were adding disks to one of our larger filesystems today. During the "checking allocation map for storage pool system" we had to interrupt the command since it was causing slow downs on our filesystem. Now commands like mmrepquota, mmdf, etc. are timing out with tsaddisk command is running message. Also during the run of the mmdf, mmrepquota, etc. filesystem becomes completely unresponsive. This command was run on ESS running version 5.2.0. Any help is much appreciated. Thank you. Damir_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From YARD at il.ibm.com Sat Jul 21 21:22:47 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Sat, 21 Jul 2018 23:22:47 +0300 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: Hi Do u run mmbackup on snapshot , which is read only ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From p.childs at qmul.ac.uk Sun Jul 22 12:26:35 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Sun, 22 Jul 2018 11:26:35 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, Message-ID: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_0C9372140C936C60006FF189C22582D1] Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_0C9306EC0C92FECC006FF189C22582D1][cid:_1_0C9308F40C92FECC006FF189C22582D1] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.gif Type: image/gif Size: 1851 bytes Desc: ATT00001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00002.gif Type: image/gif Size: 4376 bytes Desc: ATT00002.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00003.gif Type: image/gif Size: 5093 bytes Desc: ATT00003.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00004.gif Type: image/gif Size: 4746 bytes Desc: ATT00004.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00005.gif Type: image/gif Size: 4557 bytes Desc: ATT00005.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00006.gif Type: image/gif Size: 5093 bytes Desc: ATT00006.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00007.jpg Type: image/jpeg Size: 11294 bytes Desc: ATT00007.jpg URL: From jose.filipe.higino at gmail.com Sun Jul 22 13:51:03 2018 From: jose.filipe.higino at gmail.com (=?UTF-8?Q?Jos=C3=A9_Filipe_Higino?=) Date: Mon, 23 Jul 2018 00:51:03 +1200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs wrote: > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many > hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs > accessing the same file from many nodes also look to have come to an end > for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to > prioritise certain nodes over others but had not worked out if that was > possible yet. > > We've only previously seen this problem when we had some bad disks in our > storage, which we replaced, I've checked and I can't see that issue > currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Storage Architect ? IL Lab Services (Storage)* Petach Tiqva, 49527 > *IBM Global Markets, Systems HW Sales* Israel > > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > Mobile: +972-52-8395593 > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > [image: IBM Storage Strategy and Solutions v1][image: IBM Storage > Management and Data Protection v1] [image: > https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] > [image: Related image] > > > > From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / > processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00002.gif Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00003.gif Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00004.gif Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00005.gif Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00006.gif Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00007.jpg Type: image/jpeg Size: 11294 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00002.gif Type: image/gif Size: 4376 bytes Desc: not available URL: From scale at us.ibm.com Mon Jul 23 04:06:33 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 23 Jul 2018 11:06:33 +0800 Subject: [gpfsug-discuss] -o syncnfs has no effect? In-Reply-To: References: Message-ID: Hi, mmchfs Device -o syncnfs is the correct way of setting the syncnfs so that it applies to the file system both on the home and the remote cluster On 4.2.3+ syncnfs is the default option on Linux . Which means GPFS will implement the syncnfs behavior regardless of what the mount command says The documentation indicates that mmmount Device -o syncnfs=yes appears to be the correct syntax. When I tried that, I do see 'syncnfs=yes' in the output of the 'mount' command To change the remote mount option so that you don't have to specify the option on the command line every time you do mmmount, instead of using mmchfs, one should use mmremotefs update -o. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Billich Heinrich Rainer (PSI)" To: gpfsug main discussion list Date: 07/06/2018 12:06 AM Subject: [gpfsug-discuss] -o syncnfs has no effect? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, I try to mount a fs with "-o syncnfs" as we'll export it with CES/Protocols. But I never see the mount option displayed when I do # mount | grep fs-name This is a remote cluster mount, we'll run the Protocol nodes in a separate cluster. On the home cluster I see the option 'nfssync' in the output of 'mount'. My conclusion is that the mount option "syncnfs" has no effect on remote cluster mounts. Which seems a bit strange? Please can someone clarify on this? What is the impact on protocol nodes exporting remote cluster mounts? Is there any chance of data corruption? Or are some mount options implicitely inherited from the home cluster? I've read 'syncnfs' is default on Linux, but I would like to know for sure. Funny enough I can pass arbitrary options with # mmmount -o some-garbage which are silently ignored. I did 'mmchfs -o syncnfs' on the home cluster and the syncnfs option is present in /etc/fstab on the remote cluster. I did not remount on all nodes __ Thank you, I'll appreciate any hints or replies. Heiner Versions: Remote cluster 5.0.1 on RHEL7.4 (imounts the fs and runs protocol nodes) Home cluster 4.2.3-8 on RHEL6 (export the fs, owns the storage) Filesystem: 17.00 (4.2.3.0) All Linux x86_64 with Spectrum Scale Standard Edition -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scale at us.ibm.com Mon Jul 23 07:51:54 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 23 Jul 2018 14:51:54 +0800 Subject: [gpfsug-discuss] mmdiag --iohist question In-Reply-To: <351F676D-D785-4895-A278-3BEA717B9C87@vanderbilt.edu> References: <351F676D-D785-4895-A278-3BEA717B9C87@vanderbilt.edu> Message-ID: Hi Please check the IO type before examining the IP address for the output of mmdiag --iohist. For the "lcl"(local) IO, the IP address is not necessary and we don't show it. Please check whether this is your case. === mmdiag: iohist === I/O history: I/O start time RW Buf type disk:sectorNum nSec time ms Type Device/NSD ID NSD node --------------- -- ----------- ----------------- ----- ------- ---- ------------------ --------------- 01:14:08.450177 R inode 6:189513568 8 4.920 srv dm-4 192.168.116.92 01:14:08.450448 R inode 6:189513664 8 4.968 srv dm-4 192.168.116.92 01:14:08.475689 R inode 6:189428264 8 0.230 srv dm-4 192.168.116.92 01:14:08.983587 W logData 4:30686784 8 0.216 lcl dm-0 01:14:08.983601 W logData 3:25468480 8 0.197 lcl dm-8 01:14:08.983961 W inode 2:188808504 8 0.142 lcl dm-11 01:14:08.984144 W inode 1:188808504 8 0.134 lcl dm-7 Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/11/2018 10:34 PM Subject: [gpfsug-discuss] mmdiag --iohist question Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Quick question about ?mmdiag ?iohist? that is not documented in the man page ? what does it mean if the client IP address field is blank? That the NSD server itself issued the I/O? Or ??? This only happens occasionally ? and the way I discovered it was that our Python script that takes ?mmdiag ?iohist? output, looks up the client IP for any waits above the threshold, converts that to a hostname, and queries SLURM for whose jobs are on that client started occasionally throwing an exception ? and when I started looking at the ?mmdiag ?iohist? output itself I do see times when there is no client IP address listed for a I/O wait. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From p.childs at qmul.ac.uk Mon Jul 23 09:37:41 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 23 Jul 2018 08:37:41 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs > wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_0C9372140C936C60006FF189C22582D1] Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_0C9306EC0C92FECC006FF189C22582D1][cid:_1_0C9308F40C92FECC006FF189C22582D1] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" > Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From jose.filipe.higino at gmail.com Mon Jul 23 11:13:56 2018 From: jose.filipe.higino at gmail.com (=?UTF-8?Q?Jos=C3=A9_Filipe_Higino?=) Date: Mon, 23 Jul 2018 22:13:56 +1200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> Message-ID: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? How many quorum nodes? How many filesystems? Is the management network the same as the daemon network? On Mon, 23 Jul 2018 at 20:37, Peter Childs wrote: > On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: > > > Hi there, > > Have you been able to create a test case (replicate the problem)? Can you > tell us a bit more about the setup? > > > Not really, It feels like a perfect storm, any one of the tasks running on > its own would be fine, Its the shear load, our mmpmon data says the storage > has been flat lining when it occurs. > > Its a reasonably standard (small) HPC cluster, with a very mixed work > load, hence while we can usually find "bad" jobs from the point of view of > io on this occasion we can see a few large array jobs all accessing the > same file, the cluster runs fine until we get to a certain point and one > more will tip the balance. We've been attempting to limit the problem by > adding limits to the number of jobs in an array that can run at once. But > that feels like fire fighting. > > > Are you using GPFS API over any administrative commands? Any problems with > the network (being that Ethernet or IB)? > > > We're not as using the GPFS API, never got it working, which is a shame, > I've never managed to figure out the setup, although it is on my to do list. > > Network wise, We've just removed a great deal of noise from arp requests > by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit > network currently, we're currently looking at removing all the 1GBit nodes > within the next few months and adding some new faster kit. The Storage is > attached at 40GBit but it does not look to want to run much above 5Gbit I > suspect due to Ethernet back off due to the mixed speeds. > > While we do have some IB we don't currently run our storage over it. > > Thanks in advance > > Peter Childs > > > > > > Sorry if I am un-announced here for the first time. But I would like to > help if I can. > > Jose Higino, > from NIWA > New Zealand > > Cheers > > On Sun, 22 Jul 2018 at 23:26, Peter Childs wrote: > > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many > hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs > accessing the same file from many nodes also look to have come to an end > for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to > prioritise certain nodes over others but had not worked out if that was > possible yet. > > We've only previously seen this problem when we had some bad disks in our > storage, which we replaced, I've checked and I can't see that issue > currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Storage Architect ? IL Lab Services (Storage)* Petach Tiqva, 49527 > *IBM Global Markets, Systems HW Sales* Israel > > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > Mobile: +972-52-8395593 > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > [image: IBM Storage Strategy and Solutions v1][image: IBM Storage > Management and Data Protection v1] [image: > https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] > [image: Related image] > > > > From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / > processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jul 23 12:06:20 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 23 Jul 2018 11:06:20 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> Message-ID: On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? 4.2.3-8 How many quorum nodes? 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. How many filesystems? 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) Is the management network the same as the daemon network? Yes. the management network and the daemon network are the same network. Thanks in advance Peter Childs On Mon, 23 Jul 2018 at 20:37, Peter Childs > wrote: On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs > wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [X] Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][X][X] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" > Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From jose.filipe.higino at gmail.com Mon Jul 23 12:59:22 2018 From: jose.filipe.higino at gmail.com (=?UTF-8?Q?Jos=C3=A9_Filipe_Higino?=) Date: Mon, 23 Jul 2018 23:59:22 +1200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> Message-ID: Are the tiebreaker disks part of the same storage that is being used to provide disks for the NSDs of your filesystem? Having both management and daemon networks on the same network can impact the cluster in many ways. Depending on the requirements and workload conditions to run the cluster. Especially if the network is not 100% top notch or can be affected by external factors (other types of utilization). I would recur to a recent (and/or run a new one) performance benchmark result (IOR and MDTEST) and try to understand if the recordings of the current performance while observing the problem really tell something new. If not (if benchmarks tell that you are at the edge of the performance, then the best would be to consider increasing cluster performance) with additional disk hardware and/or network performance. If possible I would also recommend upgrading to the new Spectrum Scale 5 that have many new performance features. On Mon, 23 Jul 2018 at 23:06, Peter Childs wrote: > On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: > > I think the network problems need to be cleared first. Then I would > investigate further. > > Buf if that is not a trivial path... > Are you able to understand from the mmfslog what happens when the tipping > point occurs? > > > mmfslog thats not a term I've come accross before, if you mean > /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In > other words no expulsions or errors just a very slow filesystem, We've not > seen any significantly long waiters either (mmdiag --waiters) so as far as > I can see its just behaving like a very very busy filesystem. > > We've already had IBM looking at the snaps due to the rather slow mmbackup > process, all I've had back is to try increase -a ie the number of sort > threads which has speed it up to a certain extent, But once again I think > we're looking at the results of the issue not the cause. > > > In my view, when troubleshooting is not easy, the usual methods work/help > to find the next step: > - Narrow the window of troubleshooting (by discarding "for now" events > that did not happen within the same timeframe) > - Use "as precise" as possible, timebased events to read the reaction of > the cluster (via log or others) and make assumptions about other observed > situations. > - If possible and when the problem is happening, run some traces, > gpfs.snap and ask for support via PMR. > > Also, > > What is version of GPFS? > > > 4.2.3-8 > > How many quorum nodes? > > > 4 Quorum nodes with tie breaker disks, however these are not the file > system manager nodes as to fix a previous problem (with our nsd servers not > being powerful enough) our fsmanager nodes are on hardware, We have two > file system manager nodes (Which do token management, quota management etc) > they also run the mmbackup. > > How many filesystems? > > > 1, although we do have a second that is accessed via multi-cluster from > our older GPFS setup, (thats running 4.2.3-6 currently) > > Is the management network the same as the daemon network? > > > Yes. the management network and the daemon network are the same network. > > Thanks in advance > > Peter Childs > > > > On Mon, 23 Jul 2018 at 20:37, Peter Childs wrote: > > On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: > > > Hi there, > > Have you been able to create a test case (replicate the problem)? Can you > tell us a bit more about the setup? > > > Not really, It feels like a perfect storm, any one of the tasks running on > its own would be fine, Its the shear load, our mmpmon data says the storage > has been flat lining when it occurs. > > Its a reasonably standard (small) HPC cluster, with a very mixed work > load, hence while we can usually find "bad" jobs from the point of view of > io on this occasion we can see a few large array jobs all accessing the > same file, the cluster runs fine until we get to a certain point and one > more will tip the balance. We've been attempting to limit the problem by > adding limits to the number of jobs in an array that can run at once. But > that feels like fire fighting. > > > Are you using GPFS API over any administrative commands? Any problems with > the network (being that Ethernet or IB)? > > > We're not as using the GPFS API, never got it working, which is a shame, > I've never managed to figure out the setup, although it is on my to do list. > > Network wise, We've just removed a great deal of noise from arp requests > by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit > network currently, we're currently looking at removing all the 1GBit nodes > within the next few months and adding some new faster kit. The Storage is > attached at 40GBit but it does not look to want to run much above 5Gbit I > suspect due to Ethernet back off due to the mixed speeds. > > While we do have some IB we don't currently run our storage over it. > > Thanks in advance > > Peter Childs > > > > > > Sorry if I am un-announced here for the first time. But I would like to > help if I can. > > Jose Higino, > from NIWA > New Zealand > > Cheers > > On Sun, 22 Jul 2018 at 23:26, Peter Childs wrote: > > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many > hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs > accessing the same file from many nodes also look to have come to an end > for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to > prioritise certain nodes over others but had not worked out if that was > possible yet. > > We've only previously seen this problem when we had some bad disks in our > storage, which we replaced, I've checked and I can't see that issue > currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Storage Architect ? IL Lab Services (Storage)* Petach Tiqva, 49527 > *IBM Global Markets, Systems HW Sales* Israel > > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > Mobile: +972-52-8395593 > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > [image: IBM Storage Strategy and Solutions v1][image: IBM Storage > Management and Data Protection v1] [image: > https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] > [image: Related image] > > > > From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / > processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Jul 23 13:06:22 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 23 Jul 2018 08:06:22 -0400 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk><51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> Message-ID: Have you considered keeping the 1G network for daemon traffic and moving the data traffic to another network? Given the description of your configuration with only 2 manager nodes handling mmbackup and other tasks my guess is that is where the problem lies regarding performance when mmbackup is running with the many nodes accessing a single file. You said the fs managers were on hardware, does that mean other nodes in this cluster are VMs of some kind? You stated that your NSD servers were under powered. Did you address that problem in any way, that is adding memory/CPUs, or did you just move other GPFS activity off of those nodes? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/23/2018 07:06 AM Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? 4.2.3-8 How many quorum nodes? 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. How many filesystems? 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) Is the management network the same as the daemon network? Yes. the management network and the daemon network are the same network. Thanks in advance Peter Childs On Mon, 23 Jul 2018 at 20:37, Peter Childs wrote: On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Mon Jul 23 19:12:25 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 23 Jul 2018 14:12:25 -0400 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? In-Reply-To: <2165FB72-BF80-4EE4-908F-0399620C83D6@vanderbilt.edu> References: <25435.1532035523@turing-police.cc.vt.edu> <2165FB72-BF80-4EE4-908F-0399620C83D6@vanderbilt.edu> Message-ID: <22017.1532369545@turing-police.cc.vt.edu> On Thu, 19 Jul 2018 22:23:06 -0000, "Buterbaugh, Kevin L" said: > Is this what you???re looking for (from an IBMer in response to another question a few weeks back)? > > assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings: Nope, that bring zero joy (though it did give me a chance to set a more appropriate set of thresholds for our environment. And I'm still perplexed as to *where* those events are stored - what's remembering it after a 'mmhealth eventlog --clear -N all'? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 23 21:05:05 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 23 Jul 2018 20:05:05 +0000 Subject: [gpfsug-discuss] mmdiag --iohist question In-Reply-To: References: <351F676D-D785-4895-A278-3BEA717B9C87@vanderbilt.edu> Message-ID: Hi GPFS team, Yes, that?s what we see, too ? thanks. Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 23, 2018, at 1:51 AM, IBM Spectrum Scale > wrote: Hi Please check the IO type before examining the IP address for the output of mmdiag --iohist. For the "lcl"(local) IO, the IP address is not necessary and we don't show it. Please check whether this is your case. === mmdiag: iohist === I/O history: I/O start time RW Buf type disk:sectorNum nSec time ms Type Device/NSD ID NSD node --------------- -- ----------- ----------------- ----- ------- ---- ------------------ --------------- 01:14:08.450177 R inode 6:189513568 8 4.920 srv dm-4 192.168.116.92 01:14:08.450448 R inode 6:189513664 8 4.968 srv dm-4 192.168.116.92 01:14:08.475689 R inode 6:189428264 8 0.230 srv dm-4 192.168.116.92 01:14:08.983587 W logData 4:30686784 8 0.216 lcl dm-0 01:14:08.983601 W logData 3:25468480 8 0.197 lcl dm-8 01:14:08.983961 W inode 2:188808504 8 0.142 lcl dm-11 01:14:08.984144 W inode 1:188808504 8 0.134 lcl dm-7 Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Buterbaugh, Kevin L" ---07/11/2018 10:34:32 PM---Hi All, Quick question about ?mmdiag ?iohist? that is not documented in the man page ? what does it From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 07/11/2018 10:34 PM Subject: [gpfsug-discuss] mmdiag --iohist question Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Quick question about ?mmdiag ?iohist? that is not documented in the man page ? what does it mean if the client IP address field is blank? That the NSD server itself issued the I/O? Or ??? This only happens occasionally ? and the way I discovered it was that our Python script that takes ?mmdiag ?iohist? output, looks up the client IP for any waits above the threshold, converts that to a hostname, and queries SLURM for whose jobs are on that client started occasionally throwing an exception ? and when I started looking at the ?mmdiag ?iohist? output itself I do see times when there is no client IP address listed for a I/O wait. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbc6d7df8b9fb453b50bf08d5f068cc1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636679255264001433&sdata=uSiXYheeOw%2F4%2BSls8lP3XO9w7i7dFc3UWEYa%2F8aIn%2B0%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jul 23 21:06:14 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 23 Jul 2018 20:06:14 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk><51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> , Message-ID: ---- Frederick Stock wrote ---- > Have you considered keeping the 1G network for daemon traffic and moving the data traffic to another network? Considered, but never really understood the logic or value of building a second network, nor seen a good argument for the additional cost and work setting it up. While I've heard it lots of times, that the network is key to good gpfs performance. I've actually always found that it can be lots of other things too and your usally best keeping and open view and checking everything. This issue disappeared on Friday when the file system manager locked up entirely, and we failed it over to the other one and restarted gpfs. It's been fine all weekend, and currently it's looking to be a failed gpfs daemon on the manager node that was causing all the bad io. If I'd know that I'd have restarted gpfs on that node earlier... > > Given the description of your configuration with only 2 manager nodes handling mmbackup and other tasks my guess is that is where the problem lies regarding performance when mmbackup is running with the many nodes accessing a single file. You said the fs managers were on hardware, does that mean other nodes in this cluster are VMs of some kind? > > You stated that your NSD servers were under powered. Did you address that problem in any way, that is adding memory/CPUs, or did you just move other GPFS activity off of those nodes? > Our nsd servers are virtual everything else on the cluster is real. It's a gridscaler gs7k. Hence why it's difficult to throw more power at the issue. We are looking at upgrading to 5.0.1, within the next few months as we're in the progress of adding a new ssd based scratch filesystem to the cluster. Hopefully this will help resolve some of our issues. Peter Childs. > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: Peter Childs > > To: "gpfsug-discuss at spectrumscale.org" > > Date: 07/23/2018 07:06 AM > Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: > I think the network problems need to be cleared first. Then I would investigate further. > > Buf if that is not a trivial path... > Are you able to understand from the mmfslog what happens when the tipping point occurs? > > mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. > > We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. > > > In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: > - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) > - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. > - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. > > Also, > > What is version of GPFS? > > 4.2.3-8 > > How many quorum nodes? > > 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. > > How many filesystems? > > 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) > > Is the management network the same as the daemon network? > > Yes. the management network and the daemon network are the same network. > > Thanks in advance > > Peter Childs > > > > On Mon, 23 Jul 2018 at 20:37, Peter Childs > wrote: > On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: > > Hi there, > > Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? > > Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. > > Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. > > > Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? > > We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. > > Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. > > While we do have some IB we don't currently run our storage over it. > > Thanks in advance > > Peter Childs > > > > > > Sorry if I am un-announced here for the first time. But I would like to help if I can. > > Jose Higino, > from NIWA > New Zealand > > Cheers > > On Sun, 22 Jul 2018 at 23:26, Peter Childs > wrote: > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. > > We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > > > Yaron Daniel 94 Em Ha'Moshavot Rd > > Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel > > > > > > > From: Peter Childs > > To: "gpfsug-discuss at spectrumscale.org" > > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss Have you considered keeping the 1G network for daemon traffic and moving the data traffic to another network? Given the description of your configuration with only 2 manager nodes handling mmbackup and other tasks my guess is that is where the problem lies regarding performance when mmbackup is running with the many nodes accessing a single file. You said the fs managers were on hardware, does that mean other nodes in this cluster are VMs of some kind? You stated that your NSD servers were under powered. Did you address that problem in any way, that is adding memory/CPUs, or did you just move other GPFS activity off of those nodes? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/23/2018 07:06 AM Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? 4.2.3-8 How many quorum nodes? 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. How many filesystems? 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) Is the management network the same as the daemon network? Yes. the management network and the daemon network are the same network. Thanks in advance Peter Childs On Mon, 23 Jul 2018 at 20:37, Peter Childs > wrote: On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs > wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" > Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From NSCHULD at de.ibm.com Tue Jul 24 08:45:03 2018 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Tue, 24 Jul 2018 09:45:03 +0200 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? In-Reply-To: <25435.1532035523@turing-police.cc.vt.edu> References: <25435.1532035523@turing-police.cc.vt.edu> Message-ID: Hi, that message is still in memory. "mmhealth node eventlog --clear" deletes all old events but those which are currently active are not affected. I think this is related to multiple Collector Nodes, will dig deeper into that code to find out if some issue lurks there. As a stop-gap measure one could execute "mmsysmoncontrol restart" on the affected node(s) as this stops the monitoring process and doing so clears the event in memory. The data used for the event comes from mmlspool (should be close or identical to mmdf) Mit freundlichen Gr??en / Kind regards Norbert Schuld From: valdis.kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Date: 20/07/2018 00:15 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? Sent by: gpfsug-discuss-bounces at spectrumscale.org So I'm trying to tidy up things like 'mmhealth' etc. Got most of it fixed, but stuck on one thing.. Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which cleaned out a bunch of other long-past events that were "stuck" as failed / degraded even though they were corrected days/weeks ago - keep this in mind as you read on.... # mmhealth cluster show Component Total Failed Degraded Healthy Other ------------------------------------------------------------------------------------- NODE 10 0 0 10 0 GPFS 10 0 0 10 0 NETWORK 10 0 0 10 0 FILESYSTEM 1 0 1 0 0 DISK 102 0 0 102 0 CES 4 0 0 4 0 GUI 1 0 0 1 0 PERFMON 10 0 0 10 0 THRESHOLD 10 0 0 10 0 Great. One hit for 'degraded' filesystem. # mmhealth node show --unhealthy -N all (skipping all the nodes that show healthy) Node name: arnsd3-vtc.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ----------------------------------------------------------------------------------- FILESYSTEM FAILED 24 days ago pool-data_high_error (archive/system) (...) Node name: arproto2-isb.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ---------------------------------------------------------------------------------- FILESYSTEM DEGRADED 6 days ago pool-data_high_warn (archive/system) mmdf tells me: nsd_isb_01 13103005696 1 No Yes 1747905536 ( 13%) 111667200 ( 1%) nsd_isb_02 13103005696 1 No Yes 1748245504 ( 13%) 111724384 ( 1%) (94 more LUNs all within 0.2% of these for usage - data is striped out pretty well) There's also 6 SSD LUNs for metadata: nsd_isb_flash_01 2956984320 1 Yes No 2116091904 ( 72%) 26996992 ( 1%) (again, evenly striped) So who is remembering that status, and how to clear it? [attachment "attccdgx.dat" deleted by Norbert Schuld/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From heiner.billich at psi.ch Tue Jul 24 14:43:52 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Tue, 24 Jul 2018 13:43:52 +0000 Subject: [gpfsug-discuss] control which hosts become token manager Message-ID: Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From bzhang at ca.ibm.com Tue Jul 24 16:03:54 2018 From: bzhang at ca.ibm.com (Bohai Zhang) Date: Tue, 24 Jul 2018 11:03:54 -0400 Subject: [gpfsug-discuss] IBM Elastic Storage Server (ESS) Support is going to host a client facing webinar In-Reply-To: References: <25435.1532035523@turing-police.cc.vt.edu> Message-ID: Hi all, IBM Elastic Storage Server support team is going to host a webinar to discuss Spectrum Scale (GPFS) encryption. Everyone is welcome. Please use the following links to register. Thanks, NA/EU Session Date: Aug 8, 2018 Time: 10 AM - 11 AM EDT (2 PM ? 3 PM GMT) Registration: https://ibm.biz/BdY4SE Audience: Scale/ESS administrators. AP/JP/India Session Date: Aug 9, 2018 Time: 10 AM - 11 AM Beijing Time (11 AM ? 12? AM Tokyo Time) Registration: https://ibm.biz/BdY4SH Audience: Scale/ESS administrators. Regards, IBM Spectrum Computing Bohai Zhang Critical Senior Technical Leader, IBM Systems Situation Tel: 1-905-316-2727 Resolver Mobile: 1-416-897-7488 Expert Badge Email: bzhang at ca.ibm.com 3600 STEELES AVE EAST, MARKHAM, ON, L3R 9Z7, Canada Live Chat at IBMStorageSuptMobile Apps Support Portal | Fix Central | Knowledge Center | Request for Enhancement | Product SMC IBM | dWA We meet our service commitment only when you are very satisfied and EXTREMELY LIKELY to recommend IBM. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C553518.jpg Type: image/jpeg Size: 124313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C974093.gif Type: image/gif Size: 2665 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C503228.gif Type: image/gif Size: 275 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C494180.gif Type: image/gif Size: 305 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C801702.gif Type: image/gif Size: 331 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C254205.gif Type: image/gif Size: 3621 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C585014.gif Type: image/gif Size: 1243 bytes Desc: not available URL: From p.childs at qmul.ac.uk Tue Jul 24 20:28:34 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 24 Jul 2018 19:28:34 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: Message-ID: What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. >From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Tue Jul 24 22:12:06 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 24 Jul 2018 21:12:06 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: Message-ID: <366795a1f7b34edc985d85124f787774@jumptrading.com> Agree with Peter here. And if the file system and workload are of significant size then isolating the token manager to a dedicated node is definitely best practice. Unfortunately there isn't a way to specify a preferred manager per FS... (Bryan starts typing up a new RFE...). Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peter Childs Sent: Tuesday, July 24, 2018 2:29 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. >From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don't want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that 'mmdiag -tokenmgr' lists the machine as active token manager. The machine has role 'quorum-client'. This doesn't seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Wed Jul 25 17:40:46 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 25 Jul 2018 16:40:46 +0000 Subject: [gpfsug-discuss] Brief survey question: Spectrum Scale downloads and protocols Message-ID: The Spectrum Scale team is considering a change to Scale's packaging, and we'd like to get input from as many of you as possible on the likely impact. Today, Scale is available to download in two images: With Protocols, and Without Protocols. We'd like to do away with this and in future just have one image, With Protocols. To be clear, installing Protocols will still be entirely optional -- it's only the download that will change. You can find the survey here: www.surveygizmo.com/s3/4476580/IBM-Spectrum-Scale-Packaging For those interested in a little more background... Why change this? Because making two images for every Edition for every release and patch is additional work, with added testing and more opportunities for mistakes to creep in. If it's not adding real value, we'd prefer not to keep doing it! Why do we need to ask first? Because we've been doing separate images for a long time, and there was a good reason why we started doing it. But it's not clear that the original reasons are still relevant. However, we don't want to make that assumption without asking first. Thanks in advance for your help, Carl Zetie Offering Manager for Spectrum Scale, IBM - (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From SAnderson at convergeone.com Wed Jul 25 19:57:03 2018 From: SAnderson at convergeone.com (Shaun Anderson) Date: Wed, 25 Jul 2018 18:57:03 +0000 Subject: [gpfsug-discuss] Compression details Message-ID: <1532545023753.65276@convergeone.com> I've had the question come up about how SS will handle file deletion as well as overhead required for compression using zl4. The two questions I'm looking for answers (or better yet, reference material documenting) to are: 1) - How is file deletion handled? Is the block containing the compressed file decompressed, the file deleted, and then recompressed? Or is metadata simply updated showing the file is to be deleted? Does Scale run an implicit 'mmchattr --compression no' command? 2) - Are there any guidelines on the overhead to plan for in a compressed environment (lz4)? I'm not seeing any kind of sizing guidance. This is potentially going to be for an exisitng ESS GL2 system. Any assistance or direction is appreciated. Regards, ? SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments hereto may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Jul 26 00:05:27 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 25 Jul 2018 23:05:27 +0000 Subject: [gpfsug-discuss] Compression details In-Reply-To: <1532545023753.65276@convergeone.com> References: <1532545023753.65276@convergeone.com> Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Jul 26 14:24:14 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 26 Jul 2018 08:24:14 -0500 Subject: [gpfsug-discuss] Compression details In-Reply-To: <1532545023753.65276@convergeone.com> References: <1532545023753.65276@convergeone.com> Message-ID: > 1) How is file deletion handled? This depends on whether there's snapshot and whether COW is needed. If COW is not needed or there's no snapshot at all, then the file deletion is handled as non-compressed file(don't decompress the data blocks and simply discard the data blocks, then delete the inode). However, even if COW is needed, then uncompression before COW is only needed when one of following conditions is true. 1) the block to be moved is not the first block of a compression group(10 blocks is compression group since block 0). 2) the compression group ends beyond the last block of destination file (file in latest snapshot). 3) the compression group is not full and the destination file is larger. 4) the compression group ends at the last block of destination file, but the size between source and destination files are different. 5) the destination file already has some allocated blocks(COWed) within the compression group. > 2) Are there any guidelines LZ4 compression algorithm is already made good trade-off between performance and compression ratio. So it really depends on your data characters and access patterns. For example: if the data is write-once but read-many times, then there shouldn't be too much overhead as only compressed one time(I suppose decompression with lz4 doesn't consume too much resource as compression). If your data is really randomized, then compressing with lz4 doesn't give back too much help on storage space save, but still need to compress data as well as decompression when needed. But note that compressed data could also reduce the overhead to storage and network because smaller I/O size would be done for compressed file, so from application overall point of view, the overhead could be not added at all.... Regards, The Spectrum Scale (GPFS) team -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre-marie.brunet at cnes.fr Fri Jul 27 01:06:44 2018 From: pierre-marie.brunet at cnes.fr (Brunet Pierre-Marie) Date: Fri, 27 Jul 2018 00:06:44 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Message-ID: Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... => random "Unknown error 521" on NFS clients. Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers crossed !) up to now, it seems to work properly. Is there any official recommendation from IBM on this problem ? Regards, PM -- HPC center French space agency -----Message d'origine----- De?: gpfsug-discuss-bounces at spectrumscale.org De la part de gpfsug-discuss-request at spectrumscale.org Envoy??: jeudi 14 juin 2018 13:00 ??: gpfsug-discuss at spectrumscale.org Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) ---------------------------------------------------------------------- Message: 1 Date: Wed, 13 Jun 2018 17:45:44 +0300 From: "Tomer Perry" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="iso-8859-1" Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek To: gpfsug main discussion list Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add > HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Wed, 13 Jun 2018 15:14:53 +0000 From: "Wilson, Neil" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="utf-8" We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 19 ********************************************** From scale at us.ibm.com Fri Jul 27 12:56:02 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 27 Jul 2018 06:56:02 -0500 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) In-Reply-To: References: Message-ID: errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. /* Defined for the NFSv3 protocol */ #define EBADHANDLE 521 /* Illegal NFS file handle */ Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Brunet Pierre-Marie To: "gpfsug-discuss at spectrumscale.org" Date: 07/26/2018 07:17 PM Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... => random "Unknown error 521" on NFS clients. Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers crossed !) up to now, it seems to work properly. Is there any official recommendation from IBM on this problem ? Regards, PM -- HPC center French space agency -----Message d'origine----- De?: gpfsug-discuss-bounces at spectrumscale.org De la part de gpfsug-discuss-request at spectrumscale.org Envoy??: jeudi 14 juin 2018 13:00 ??: gpfsug-discuss at spectrumscale.org Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) ---------------------------------------------------------------------- Message: 1 Date: Wed, 13 Jun 2018 17:45:44 +0300 From: "Tomer Perry" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="iso-8859-1" Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek To: gpfsug main discussion list Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add > HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180613/3cf6fbce/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 13 Jun 2018 15:14:53 +0000 From: "Wilson, Neil" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="utf-8" We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From xhejtman at ics.muni.cz Fri Jul 27 13:06:11 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 27 Jul 2018 14:06:11 +0200 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) In-Reply-To: References: Message-ID: <20180727120611.aunjlxht33vp7txf@ics.muni.cz> Hello, no it is not. It's a bug in GPFS vfs layer, efix has been already released. On Fri, Jul 27, 2018 at 06:56:02AM -0500, IBM Spectrum Scale wrote: > > errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. > > > /* Defined for the NFSv3 protocol */ > #define EBADHANDLE 521 /* Illegal NFS file handle */ > > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Brunet Pierre-Marie > To: "gpfsug-discuss at spectrumscale.org" > > Date: 07/26/2018 07:17 PM > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi, > > We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 > and RHEL 7.5 with 4 gateways servers executing Kernel NFS... > => random "Unknown error 521" on NFS clients. > > Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers > crossed !) up to now, it seems to work properly. > > Is there any official recommendation from IBM on this problem ? > > Regards, > PM > -- > HPC center > French space agency > > -----Message d'origine----- > De?: gpfsug-discuss-bounces at spectrumscale.org > De la part de > gpfsug-discuss-request at spectrumscale.org > Envoy??: jeudi 14 juin 2018 13:00 > ??: gpfsug-discuss at spectrumscale.org > Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific than > "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) > 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 13 Jun 2018 17:45:44 +0300 > From: "Tomer Perry" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Message-ID: > > > > > Content-Type: text/plain; charset="iso-8859-1" > > Please open a service ticket > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Lukas Hejtmanek > To: gpfsug main discussion list > Date: 13/06/2018 13:14 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > > knfs is supported - with or without the cNFS feature ( cNFS will add > > HA > > > to NFS on top of GPFS - > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > > > > ). > > > > knfs and cNFS can't coexist with CES in the same environment. > > well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180613/3cf6fbce/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Wed, 13 Jun 2018 15:14:53 +0000 > From: "Wilson, Neil" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Message-ID: > > > > > Content-Type: text/plain; charset="utf-8" > > We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version > 3.10.0-693.21.1.el7.x86_64 and are not having any errors. > So it's probably just GPFS not being ready for 7.5 yet. > > Neil. > > Neil Wilson? Senior IT Practitioner > Storage, Virtualisation and Mainframe Team?? IT Services > Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Jonathan Buzzard > Sent: 13 June 2018 10:33 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > > On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > > Hello, > > > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > > Not sure whether it is due to kernel or GPFS. > > > > GPFS being not supported on 7.5 at this time would be the starting point. I > am also under the impression that kernel NFS was not supported either it's > Ganesha or nothing. > > The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the > past that has worked for me. > > JAB. > > -- > Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System > Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > End of gpfsug-discuss Digest, Vol 77, Issue 19 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From neil.wilson at metoffice.gov.uk Fri Jul 27 13:26:28 2018 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Fri, 27 Jul 2018 12:26:28 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: We are still running 7.4 with 4.2.3-9 on our NSD nodes, cNFS nodes and client nodes. A rhel 7.5 client node build is being tested at the moment and will be deployed if testing is a success. However I don't think we will be upgrading the NSD nodes or cNFS nodes to 7.5 for a while. Regards Neil Neil Wilson Senior IT Practitioner Storage, Virtualisation and Mainframe Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of IBM Spectrum Scale Sent: 27 July 2018 12:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. /* Defined for the NFSv3 protocol */ #define EBADHANDLE 521 /* Illegal NFS file handle */ Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. [Inactive hide details for Brunet Pierre-Marie ---07/26/2018 07:17:25 PM---Hi, We are facing the same issue : we just upgrade o]Brunet Pierre-Marie ---07/26/2018 07:17:25 PM---Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 From: Brunet Pierre-Marie > To: "gpfsug-discuss at spectrumscale.org" > Date: 07/26/2018 07:17 PM Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... => random "Unknown error 521" on NFS clients. Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers crossed !) up to now, it seems to work properly. Is there any official recommendation from IBM on this problem ? Regards, PM -- HPC center French space agency -----Message d'origine----- De : gpfsug-discuss-bounces at spectrumscale.org > De la part de gpfsug-discuss-request at spectrumscale.org Envoy? : jeudi 14 juin 2018 13:00 ? : gpfsug-discuss at spectrumscale.org Objet : gpfsug-discuss Digest, Vol 77, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) ---------------------------------------------------------------------- Message: 1 Date: Wed, 13 Jun 2018 17:45:44 +0300 From: "Tomer Perry" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: > Content-Type: text/plain; charset="iso-8859-1" Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek > To: gpfsug main discussion list > Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add > HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Wed, 13 Jun 2018 15:14:53 +0000 From: "Wilson, Neil" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: > Content-Type: text/plain; charset="utf-8" We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From pierre-marie.brunet at cnes.fr Fri Jul 27 14:56:04 2018 From: pierre-marie.brunet at cnes.fr (Brunet Pierre-Marie) Date: Fri, 27 Jul 2018 13:56:04 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) (IBM Spectrum Scale) Message-ID: Hi Scale Team, I know but I can't reproduce the problem with a simple kernel NFS server on a RH7.5 with a local filesystem for instance. It seems to be linked somehow with GPFS 4.2.3-9... I don't know what is the behavior with previous release. But as I said, the downgrade to RHE7.4 has solved the problem... vicious bug for sure. Regards, PM -- HPC center French space agency -----Message d'origine----- De?: gpfsug-discuss-bounces at spectrumscale.org De la part de gpfsug-discuss-request at spectrumscale.org Envoy??: vendredi 27 juillet 2018 14:22 ??: gpfsug-discuss at spectrumscale.org Objet?: gpfsug-discuss Digest, Vol 78, Issue 68 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) (IBM Spectrum Scale) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) (Lukas Hejtmanek) ---------------------------------------------------------------------- Message: 1 Date: Fri, 27 Jul 2018 06:56:02 -0500 From: "IBM Spectrum Scale" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Message-ID: Content-Type: text/plain; charset="iso-8859-1" errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. /* Defined for the NFSv3 protocol */ #define EBADHANDLE 521 /* Illegal NFS file handle */ Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Brunet Pierre-Marie To: "gpfsug-discuss at spectrumscale.org" Date: 07/26/2018 07:17 PM Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... => random "Unknown error 521" on NFS clients. Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers crossed !) up to now, it seems to work properly. Is there any official recommendation from IBM on this problem ? Regards, PM -- HPC center French space agency -----Message d'origine----- De?: gpfsug-discuss-bounces at spectrumscale.org De la part de gpfsug-discuss-request at spectrumscale.org Envoy??: jeudi 14 juin 2018 13:00 ??: gpfsug-discuss at spectrumscale.org Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) ---------------------------------------------------------------------- Message: 1 Date: Wed, 13 Jun 2018 17:45:44 +0300 From: "Tomer Perry" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="iso-8859-1" Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek To: gpfsug main discussion list Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add > HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180613/3cf6fbce/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 13 Jun 2018 15:14:53 +0000 From: "Wilson, Neil" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="utf-8" We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ Message: 2 Date: Fri, 27 Jul 2018 14:06:11 +0200 From: Lukas Hejtmanek To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Message-ID: <20180727120611.aunjlxht33vp7txf at ics.muni.cz> Content-Type: text/plain; charset=utf8 Hello, no it is not. It's a bug in GPFS vfs layer, efix has been already released. On Fri, Jul 27, 2018 at 06:56:02AM -0500, IBM Spectrum Scale wrote: > > errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. > > > /* Defined for the NFSv3 protocol */ > #define EBADHANDLE 521 /* Illegal NFS file handle */ > > > Regards, The Spectrum Scale (GPFS) team > > ---------------------------------------------------------------------- > -------------------------------------------- > > If you feel that your question can benefit other users of Spectrum > Scale (GPFS), then please post it to the public IBM developerWroks > Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) and you have an IBM software maintenance contract please > contact > 1-800-237-5511 in the United States or your local IBM Service Center > in other countries. > > The forum is informally monitored as time permits and should not be > used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Brunet Pierre-Marie > To: "gpfsug-discuss at spectrumscale.org" > > Date: 07/26/2018 07:17 PM > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi, > > We are facing the same issue : we just upgrade our cluster to GPFS > 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... > => random "Unknown error 521" on NFS clients. > > Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers > crossed !) up to now, it seems to work properly. > > Is there any official recommendation from IBM on this problem ? > > Regards, > PM > -- > HPC center > French space agency > > -----Message d'origine----- > De?: gpfsug-discuss-bounces at spectrumscale.org > De la part de > gpfsug-discuss-request at spectrumscale.org > Envoy??: jeudi 14 juin 2018 13:00 > ??: gpfsug-discuss at spectrumscale.org > Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than > "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) > 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 13 Jun 2018 17:45:44 +0300 > From: "Tomer Perry" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Message-ID: > > llabserv.com> > > > Content-Type: text/plain; charset="iso-8859-1" > > Please open a service ticket > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Lukas Hejtmanek > To: gpfsug main discussion list > Date: 13/06/2018 13:14 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > > knfs is supported - with or without the cNFS feature ( cNFS will > > add HA > > > to NFS on top of GPFS - > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.sp > ectrum.scale.v5r01.doc/bl1adv_cnfs.htm > > > > ). > > > > knfs and cNFS can't coexist with CES in the same environment. > > well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > -------------- next part -------------- An HTML attachment was > scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180613/3cf6fb > ce/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Wed, 13 Jun 2018 15:14:53 +0000 > From: "Wilson, Neil" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Message-ID: > > changelabs.com> > > > Content-Type: text/plain; charset="utf-8" > > We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version > 3.10.0-693.21.1.el7.x86_64 and are not having any errors. > So it's probably just GPFS not being ready for 7.5 yet. > > Neil. > > Neil Wilson? Senior IT Practitioner > Storage, Virtualisation and Mainframe Team?? IT Services Met Office > FitzRoy Road Exeter Devon EX1 3PB United Kingdom > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Jonathan > Buzzard > Sent: 13 June 2018 10:33 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > > On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > > Hello, > > > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > > Not sure whether it is due to kernel or GPFS. > > > > GPFS being not supported on 7.5 at this time would be the starting > point. I am also under the impression that kernel NFS was not > supported either it's Ganesha or nothing. > > The interim fix is probably to downgrade to a 7.4 kernel. Certainly in > the past that has worked for me. > > JAB. > > -- > Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC > System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > End of gpfsug-discuss Digest, Vol 77, Issue 19 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 78, Issue 68 ********************************************** From scale at us.ibm.com Fri Jul 27 15:43:16 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 27 Jul 2018 09:43:16 -0500 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) In-Reply-To: <20180727120611.aunjlxht33vp7txf@ics.muni.cz> References: <20180727120611.aunjlxht33vp7txf@ics.muni.cz> Message-ID: There is a fix in 4.2.3.9 efix3 that corrects a condition where GPFS was failing a revalidate call and that was causing kNFS to generate EBADHANDLE. Without more information on your case (traces), I cannot say for sure that this will resolve your issue, but it is available for you to try. Regards, The Spectrum Scale (GPFS) team -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 27 16:18:50 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 27 Jul 2018 15:18:50 +0000 Subject: [gpfsug-discuss] Power9 / GPFS Message-ID: I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 27 16:30:42 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 27 Jul 2018 15:30:42 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: References: Message-ID: <13B6CE9B-CF93-43BB-A120-136CCC3AC7BC@vanderbilt.edu> Hi Simon, Have you tried running it with the ??silent? flag, too? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 27, 2018, at 10:18 AM, Simon Thompson > wrote: I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9660d98faa7b4241b52508d5f3d44462%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636683015365941338&sdata=8%2BKtcv8Tm3S5OS67xX5lOZatL%2B7mHZ71HXgm6dalEmg%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Fri Jul 27 16:32:55 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Fri, 27 Jul 2018 15:32:55 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: <366795a1f7b34edc985d85124f787774@jumptrading.com> References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: Thank you, The cluster was freshly set up and the VM node never was denoted as manager, it was created as quorum-client. What I didn?t mention but probably should have: This is a multicluster mount, the cluster has no own storage. Hence the filesystem manager are on the home cluster, according to mmlsmgr. Hm, probably more complicated as I initially thought. Still I would expect that for file-access that is restricted to this cluster all token management is handled inside the cluster, too? And I don?t want the weakest node to participate. Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of Bryan Banister Reply-To: gpfsug main discussion list Date: Tuesday 24 July 2018 at 23:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Agree with Peter here. And if the file system and workload are of significant size then isolating the token manager to a dedicated node is definitely best practice. Unfortunately there isn?t a way to specify a preferred manager per FS? (Bryan starts typing up a new RFE?). Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peter Childs Sent: Tuesday, July 24, 2018 2:29 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Fri Jul 27 16:40:11 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Fri, 27 Jul 2018 15:40:11 +0000 Subject: [gpfsug-discuss] Power9 / GPFS Message-ID: <6756DCC6-7366-4F8C-8D61-F38D0241CDB4@psi.ch> Hello If you don?t need the installer maybe just extract the RPMs, this bypasses java. For x86_64 I use commands like the once below, shouldn?t be much different on power. TARFILE=$1 START=$( grep -a -m 1 ^PGM_BEGIN_TGZ= $TARFILE| cut -d= -f2) echo extract RPMs from $TARFILE with START=$START tail -n +$START $TARFILE | tar xvzf - *.rpm */repodata/* Kind regards, Heiner -- Paul Scherrer Institut From: on behalf of Simon Thompson Reply-To: gpfsug main discussion list Date: Friday 27 July 2018 at 17:19 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Power9 / GPFS I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 27 16:41:39 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 27 Jul 2018 15:41:39 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: <13B6CE9B-CF93-43BB-A120-136CCC3AC7BC@vanderbilt.edu> References: <13B6CE9B-CF93-43BB-A120-136CCC3AC7BC@vanderbilt.edu> Message-ID: Yeah does the same ? The system java seems to do it is well ? maybe its just broken ? Simon From: on behalf of "Buterbaugh, Kevin L" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 16:32 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS Hi Simon, Have you tried running it with the ??silent? flag, too? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 27, 2018, at 10:18 AM, Simon Thompson > wrote: I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9660d98faa7b4241b52508d5f3d44462%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636683015365941338&sdata=8%2BKtcv8Tm3S5OS67xX5lOZatL%2B7mHZ71HXgm6dalEmg%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Fri Jul 27 16:35:14 2018 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 27 Jul 2018 15:35:14 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: References: Message-ID: License acceptance notwithstanding, the RPM extraction should at least be achievable with? tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: Friday, July 27, 2018 11:19 AM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Power9 / GPFS I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 27 16:54:16 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 27 Jul 2018 15:54:16 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: <6756DCC6-7366-4F8C-8D61-F38D0241CDB4@psi.ch> References: <6756DCC6-7366-4F8C-8D61-F38D0241CDB4@psi.ch> Message-ID: <986024E4-512D-45A0-A859-EBED468B07A3@bham.ac.uk> Thanks, (and also Paul with a very similar comment)? I now have my packages unpacked ? and hey, who needs java anyway ? Simon From: on behalf of "heiner.billich at psi.ch" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 16:40 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS Hello If you don?t need the installer maybe just extract the RPMs, this bypasses java. For x86_64 I use commands like the once below, shouldn?t be much different on power. TARFILE=$1 START=$( grep -a -m 1 ^PGM_BEGIN_TGZ= $TARFILE| cut -d= -f2) echo extract RPMs from $TARFILE with START=$START tail -n +$START $TARFILE | tar xvzf - *.rpm */repodata/* Kind regards, Heiner -- Paul Scherrer Institut From: on behalf of Simon Thompson Reply-To: gpfsug main discussion list Date: Friday 27 July 2018 at 17:19 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Power9 / GPFS I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From gcorneau at us.ibm.com Fri Jul 27 17:02:42 2018 From: gcorneau at us.ibm.com (Glen Corneau) Date: Fri, 27 Jul 2018 11:02:42 -0500 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: References: Message-ID: Just curious, do you have the zStream1 patches installed? # uname -a Linux ac922a.pvw.ibm.com 4.14.0-49.2.2.el7a.ppc64le #1 SMP Fri Apr 27 15:37:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ------------------ Glen Corneau Cognitive Systems Washington Systems Center gcorneau at us.ibm.com From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 07/27/2018 10:19 AM Subject: [gpfsug-discuss] Power9 / GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 26117 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Fri Jul 27 17:05:37 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 27 Jul 2018 16:05:37 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: References: Message-ID: <23C90277-1A82-4802-9902-6BB7149B4563@bham.ac.uk> # uname -a Linux localhost.localdomain 4.14.0-49.el7a.ppc64le #1 SMP Wed Mar 14 13:58:40 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux Its literally out of the box ? Simon From: on behalf of "gcorneau at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 17:03 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS Just curious, do you have the zStream1 patches installed? # uname -a Linux ac922a.pvw.ibm.com 4.14.0-49.2.2.el7a.ppc64le #1 SMP Fri Apr 27 15:37:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ------------------ Glen Corneau Cognitive Systems Washington Systems Center gcorneau at us.ibm.com [cid:_2_DC560798DC56051000576CD7862582D7] From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 07/27/2018 10:19 AM Subject: [gpfsug-discuss] Power9 / GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 26118 bytes Desc: image001.jpg URL: From heiner.billich at psi.ch Fri Jul 27 17:50:17 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Fri, 27 Jul 2018 16:50:17 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: Hello, So probably I was wrong from the beginning ? please can somebody clarify: In a multicluster environment with all storage and filesystem hosted by a single cluster all token managers will reside in this central cluster? Or are there also token managers in the storage-less clusters which just mount? This managers wouldn?t be accessible by all nodes which access the file system, hence I doubt this exists. Still it would be nice to know how to influence the token manager placement and how to exclude certain machines. And the output of ?mmdiag ?tokenmgr? indicates that there _are_ token manager in the remote-mounting cluster ? confusing. I would greatly appreciate if somebody could sort this out. A point to the relevant documentation would also be welcome. Thank you & Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of "Billich Heinrich Rainer (PSI)" Reply-To: gpfsug main discussion list Date: Friday 27 July 2018 at 17:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Thank you, The cluster was freshly set up and the VM node never was denoted as manager, it was created as quorum-client. What I didn?t mention but probably should have: This is a multicluster mount, the cluster has no own storage. Hence the filesystem manager are on the home cluster, according to mmlsmgr. Hm, probably more complicated as I initially thought. Still I would expect that for file-access that is restricted to this cluster all token management is handled inside the cluster, too? And I don?t want the weakest node to participate. Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of Bryan Banister Reply-To: gpfsug main discussion list Date: Tuesday 24 July 2018 at 23:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Agree with Peter here. And if the file system and workload are of significant size then isolating the token manager to a dedicated node is definitely best practice. Unfortunately there isn?t a way to specify a preferred manager per FS? (Bryan starts typing up a new RFE?). Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peter Childs Sent: Tuesday, July 24, 2018 2:29 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Jul 27 18:09:56 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 27 Jul 2018 17:09:56 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: <40989560bbc0448896e0301407388790@jumptrading.com> Yes, the token managers will reside on the NSD Server Cluster which has the NSD Servers that provide access to the underlying data and metadata storage. I believe that all nodes that have the ?manager? designation will participate in the token management operations as needed. Though there is not a way to specify which node will be assigned the primary file system manager or overall cluster manager, which are two different roles but may reside on the same node. Tokens themselves, however, are distributed and managed by clients directly. When a file is first opened then the node that opened the file will be the ?metanode? for the file, and all metadata updates on the file will be handled by this metanode until it closes the file handle, in which case another node will become the ?metanode?. For byte range locking, the file system manager will handle revoking tokens from nodes that have a byte range lock when another node requests access to the same byte range region. This ensures that nodes cannot hold byte range locks that prevent other nodes from accessing byte range regions of a file. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Billich Heinrich Rainer (PSI) Sent: Friday, July 27, 2018 11:50 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ Hello, So probably I was wrong from the beginning ? please can somebody clarify: In a multicluster environment with all storage and filesystem hosted by a single cluster all token managers will reside in this central cluster? Or are there also token managers in the storage-less clusters which just mount? This managers wouldn?t be accessible by all nodes which access the file system, hence I doubt this exists. Still it would be nice to know how to influence the token manager placement and how to exclude certain machines. And the output of ?mmdiag ?tokenmgr? indicates that there _are_ token manager in the remote-mounting cluster ? confusing. I would greatly appreciate if somebody could sort this out. A point to the relevant documentation would also be welcome. Thank you & Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: > on behalf of "Billich Heinrich Rainer (PSI)" > Reply-To: gpfsug main discussion list > Date: Friday 27 July 2018 at 17:33 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] control which hosts become token manager Thank you, The cluster was freshly set up and the VM node never was denoted as manager, it was created as quorum-client. What I didn?t mention but probably should have: This is a multicluster mount, the cluster has no own storage. Hence the filesystem manager are on the home cluster, according to mmlsmgr. Hm, probably more complicated as I initially thought. Still I would expect that for file-access that is restricted to this cluster all token management is handled inside the cluster, too? And I don?t want the weakest node to participate. Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: > on behalf of Bryan Banister > Reply-To: gpfsug main discussion list > Date: Tuesday 24 July 2018 at 23:12 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] control which hosts become token manager Agree with Peter here. And if the file system and workload are of significant size then isolating the token manager to a dedicated node is definitely best practice. Unfortunately there isn?t a way to specify a preferred manager per FS? (Bryan starts typing up a new RFE?). Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Peter Childs Sent: Tuesday, July 24, 2018 2:29 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Jul 27 18:31:46 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 27 Jul 2018 12:31:46 -0500 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: Only nodes in the home cluster will participate as token managers. Note that "mmdiag --tokenmgr" lists all potential token manager nodes, but there will be additional information for the nodes that are currently appointed. --tokenmgr Displays information about token management. For each mounted GPFS file system, one or more token manager nodes is appointed. The first token manager is always colocated with the file system manager, while other token managers can be appointed from the pool of nodes with the manager designation. The information that is shown here includes the list of currently appointed token manager nodes and, if the current node is serving as a token manager, some statistics about prior token transactions. Regards, The Spectrum Scale (GPFS) team -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Jul 27 19:27:19 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 27 Jul 2018 20:27:19 +0200 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data In-Reply-To: <83A6EEB0EC738F459A39439733AE80452672ADC8@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE80452672ADC8@MBX114.d.ethz.ch> Message-ID: Hi, as there are more often similar questions rising, we just put an article about the topic on the Spectrum Scale Wiki https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20 (GPFS)/page/Downsampling%2C%20Upsampling%20and%20Aggregation%20of%20the%20performance%20data While there will be some minor updates on the article in the next time, it might already explain your questions. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 13.07.2018 12:08 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes. I've the following perfmon configuration for the metric-group GPFSNSDDisk: { name = "GPFSNSDDisk" period = 2 restrict = "nsdNodes" }, that, as far as I know sends data to the collector every 2 seconds (correct ?). But how ? does it send what it reads from the counter every two seconds ? or does it aggregated in some way ? or what else ? In the collector node pmcollector, grafana-bridge and grafana-server run. Now I need to understand how to play with the grafana parameters: - Down sample (or Disable downsampling) - Aggregator (following on the same row the metrics). See attached picture 4s.png as reference. In the past I had the period set to 1. And grafana used to display correct data (bytes/s for the metric gpfs_nsdds_bytes_written) with aggregator set to "sum", which AFAIK means "sum all that metrics that match the filter below" (again see the attached picture to see how the filter is set to only collect data from the IO nodes). Today I've changed to "period=2"... and grafana started to display funny data rate (the double, or quad of the real rate). I had to play (almost randomly) with "Aggregator" (from sum to avg, which as fas as I undestand doesn't mean anything in my case... average between the two IO nodes ? or what ?) and "Down sample" (from empty to 2s, and then to 4s) to get back real data rate which is compliant with what I do get with dstat. Can someone kindly explain how to play with these parameters when zimon sensor's period is changed ? Many thanks in advance Regards, Alvise Dorigo[attachment "4s.png" deleted by Manfred Haubrich/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Achim.Rehor at de.ibm.com Sat Jul 28 10:16:04 2018 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Sat, 28 Jul 2018 11:16:04 +0200 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jul 30 16:27:28 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 30 Jul 2018 15:27:28 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: <23C90277-1A82-4802-9902-6BB7149B4563@bham.ac.uk> References: <23C90277-1A82-4802-9902-6BB7149B4563@bham.ac.uk> Message-ID: <24C8CF4A-D0D9-4DC0-B499-6B64D50DF3BC@bham.ac.uk> Just to close the loop on this, this is a bug in the RHEL7.5 first shipped alt kernel for the P9 systems. Patching to a later kernel errata package fixed the issues. I?ve confirmed that upgrading and re-running the installer works fine. Thanks to Julian who contacted me off-list about this. Simon From: on behalf of Simon Thompson Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 17:06 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS # uname -a Linux localhost.localdomain 4.14.0-49.el7a.ppc64le #1 SMP Wed Mar 14 13:58:40 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux Its literally out of the box ? Simon From: on behalf of "gcorneau at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 17:03 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS Just curious, do you have the zStream1 patches installed? # uname -a Linux ac922a.pvw.ibm.com 4.14.0-49.2.2.el7a.ppc64le #1 SMP Fri Apr 27 15:37:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ------------------ Glen Corneau Cognitive Systems Washington Systems Center gcorneau at us.ibm.com [cid:_2_DC560798DC56051000576CD7862582D7] From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 07/27/2018 10:19 AM Subject: [gpfsug-discuss] Power9 / GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 26119 bytes Desc: image001.jpg URL: From Renar.Grunenberg at huk-coburg.de Tue Jul 31 10:03:54 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 31 Jul 2018 09:03:54 +0000 Subject: [gpfsug-discuss] Question about mmsdrrestore Message-ID: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> Hallo All, are there some experiences about the possibility to install/upgrade some existing nodes in a GPFS 4.2.3.x Cluster (OS Rhel6.7) with a fresh OS install to rhel7.5 and reinstall then new GPFS code 5.0.1.1 and do a mmsdrrestore on these node from a 4.2.3 Node. Is it possible, or must we install 4.2.3 Code first, make the mmsdrestore step and then update to 5.0.1.1? Any Hints are appreciate. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Jul 31 10:09:37 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 31 Jul 2018 09:09:37 +0000 Subject: [gpfsug-discuss] Question about mmsdrrestore In-Reply-To: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> References: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> Message-ID: My gut feeling says it?s not possible. If this were me I?d upgrade to 5.0.1.1, make sure it?s working, and then reinstall the node. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Grunenberg, Renar Sent: 31 July 2018 10:04 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Question about mmsdrrestore Hallo All, are there some experiences about the possibility to install/upgrade some existing nodes in a GPFS 4.2.3.x Cluster (OS Rhel6.7) with a fresh OS install to rhel7.5 and reinstall then new GPFS code 5.0.1.1 and do a mmsdrrestore on these node from a 4.2.3 Node. Is it possible, or must we install 4.2.3 Code first, make the mmsdrestore step and then update to 5.0.1.1? Any Hints are appreciate. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jul 31 14:03:52 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 31 Jul 2018 13:03:52 +0000 Subject: [gpfsug-discuss] mmdf vs. df Message-ID: Hallo All, a question whats happening here: We are on GPFS 5.0.1.1 and host a TSM-Server-Cluster. A colleague from me want to add new nsd?s to grow its tsm-storagepool (filedevice class volumes). The tsmpool fs has before 45TB of space after that 128TB. We create new 50 GB tsm-volumes with define volume cmd, but the cmd goes in error after the allocating of 89TB. Following Outputs here: [root at node_a tsmpool]# df -hT Filesystem Type Size Used Avail Use% Mounted on tsmpool gpfs 128T 128T 44G 100% /gpfs/tsmpool root at node_a tsmpool]# mmdf tsmpool --block-size auto disk disk size failure holds holds free free name group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: system (Maximum disk size allowed is 839.99 GB) nsd_r2g8f_tsmpool_001 100G 0 Yes No 88G ( 88%) 10.4M ( 0%) nsd_c4g8f_tsmpool_001 100G 1 Yes No 88G ( 88%) 10.4M ( 0%) nsd_g4_tsmpool 256M 2 No No 0 ( 0%) 0 ( 0%) ------------- -------------------- ------------------- (pool total) 200.2G 176G ( 88%) 20.8M ( 0%) Disks in storage pool: data01 (Maximum disk size allowed is 133.50 TB) nsd_r2g8d_tsmpool_016 8T 0 No Yes 3.208T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_015 8T 0 No Yes 3.205T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_014 8T 0 No Yes 3.208T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_013 8T 0 No Yes 3.206T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_012 8T 0 No Yes 3.208T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_011 8T 0 No Yes 3.205T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_001 8T 0 No Yes 1.48G ( 0%) 14.49M ( 0%) nsd_r2g8d_tsmpool_002 8T 0 No Yes 1.582G ( 0%) 16.12M ( 0%) nsd_r2g8d_tsmpool_003 8T 0 No Yes 1.801G ( 0%) 14.7M ( 0%) nsd_r2g8d_tsmpool_004 8T 0 No Yes 1.629G ( 0%) 15.21M ( 0%) nsd_r2g8d_tsmpool_005 8T 0 No Yes 1.609G ( 0%) 14.22M ( 0%) nsd_r2g8d_tsmpool_006 8T 0 No Yes 1.453G ( 0%) 17.4M ( 0%) nsd_r2g8d_tsmpool_010 8T 0 No Yes 3.208T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_009 8T 0 No Yes 3.197T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_007 8T 0 No Yes 3.194T ( 40%) 7.875M ( 0%) nsd_r2g8d_tsmpool_008 8T 0 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_016 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_006 8T 1 No Yes 888M ( 0%) 21.63M ( 0%) nsd_c4g8d_tsmpool_005 8T 1 No Yes 996M ( 0%) 18.22M ( 0%) nsd_c4g8d_tsmpool_004 8T 1 No Yes 920M ( 0%) 11.21M ( 0%) nsd_c4g8d_tsmpool_003 8T 1 No Yes 984M ( 0%) 14.7M ( 0%) nsd_c4g8d_tsmpool_002 8T 1 No Yes 1.082G ( 0%) 11.89M ( 0%) nsd_c4g8d_tsmpool_001 8T 1 No Yes 1.035G ( 0%) 14.49M ( 0%) nsd_c4g8d_tsmpool_007 8T 1 No Yes 3.281T ( 41%) 7.867M ( 0%) nsd_c4g8d_tsmpool_008 8T 1 No Yes 3.199T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_009 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_010 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_011 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_012 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_013 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_014 8T 1 No Yes 3.195T ( 40%) 7.875M ( 0%) nsd_c4g8d_tsmpool_015 8T 1 No Yes 3.194T ( 40%) 7.867M ( 0%) ------------- -------------------- ------------------- (pool total) 256T 64.09T ( 25%) 341.6M ( 0%) ============= ==================== =================== (data) 256T 64.09T ( 25%) 341.6M ( 0%) (metadata) 200G 176G ( 88%) 20.8M ( 0%) ============= ==================== =================== (total) 256.2T 64.26T ( 25%) 362.4M ( 0%) In GPFS we had already space but the above df seems to be wrong and that make TSM unhappy. If we manually wrote a 50GB File in this FS like: [root at sap00733 tsmpool]# dd if=/dev/zero of=/gpfs/tsmpool/output bs=2M count=25600 25600+0 records in 25600+0 records out 53687091200 bytes (54 GB) copied, 30.2908 s, 1.8 GB/s We see at df level now these: [root at sap00733 tsmpool]# df -hT Filesystem Type Size Used Avail Use% Mounted on tsmpool gpfs 128T 96T 33T 75% /gpfs/tsmpool if we delete these file we see already the first output of 44G free space only. This seems to be the os df Interface seems to be brocken here. What I also must mentioned we use some ignore parameters: root @node_a(rhel7.4)> mmfsadm dump config |grep ignore ignoreNonDioInstCount 0 ! ignorePrefetchLUNCount 1 ignoreReplicaSpaceOnStat 0 ignoreReplicationForQuota 0 ! ignoreReplicationOnStatfs 1 ignoreSync 0 the fs has the -S relatime option. Are there any Known bug here existend ? Any hints on that? Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.utermann at physik.uni-augsburg.de Tue Jul 31 16:02:51 2018 From: ralf.utermann at physik.uni-augsburg.de (Ralf Utermann) Date: Tue, 31 Jul 2018 17:02:51 +0200 Subject: [gpfsug-discuss] Question about mmsdrrestore In-Reply-To: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> References: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> Message-ID: <1de976d6-bc61-b1ff-b953-b28886f8e2c4@physik.uni-augsburg.de> Hi Renar, we reinstalled a previous Debian jessie + GPFS 4.2.3 client to Ubuntu 16.04 + GPFS 5.0.1-1 and did a mmsdrrestore from one of our 4.2.3.8 NSD servers without problems. regards, Ralf On 31.07.2018 11:03, Grunenberg, Renar wrote: > Hallo All, > > ? > > are there some experiences about the possibility to install/upgrade some > existing nodes in a GPFS 4.2.3.x Cluster (OS Rhel6.7) with a fresh OS install to > rhel7.5 and reinstall then new GPFS code 5.0.1.1 > > and do a mmsdrrestore on these node from a 4.2.3 Node. Is it possible, or must > we install 4.2.3 Code first, make the mmsdrestore step and then update to 5.0.1.1? > > Any Hints are appreciate. > > Renar?Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444?Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > > -------------------------------------------------------------------------------- > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands > a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > -------------------------------------------------------------------------------- > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist > nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information in > error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in this > information is strictly forbidden. > -------------------------------------------------------------------------------- > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411 From YARD at il.ibm.com Sun Jul 1 18:12:04 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Sun, 1 Jul 2018 20:12:04 +0300 Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Hi Just check : 1) getenfore - Selinux status 2) check if FW is active - iptables -L 3) do u have ping to the host report in mmlscluster ? /etc/hosts valid ? DNS is valid ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Uwe Falke" To: renata at SLAC.STANFORD.EDU, gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Date: 06/28/2018 10:45 AM Subject: Re: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Just some ideas what to try. when you attempted mmdelnode, was that node still active with the IP address known in the cluster? If so, shut it down and try again. Mind the restrictions of mmdelnode though (can't delete NSD servers). Try to fake one of the currently missing cluster nodes, or restore the old system backup to the reinstalled server, if available, or temporarily install gpfs SW there and copy over the GPFS config stuff from a node still active (/var/mmfs/), configure the admin and daemon IFs of the faked node on that machine, then try to start the cluster and see if it comes up with quorum, if it does then go ahead and cleanly de-configure what's needed to remove that node from the cluster gracefully. Once you reach quorum with the remaining nodes you are in safe area. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Renata Maria Dart To: Simon Thompson Cc: gpfsug main discussion list Date: 27/06/2018 21:30 Subject: Re: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Simon, yes I ran mmsdrrestore -p and that helped to create the /var/mmfs/ccr directory which was missing. But it didn't create a ccr.nodes file, so I ended up scp'ng that over by hand which I hope was the right thing to do. The one host that is no longer in service is still in that ccr.nodes file and when I try to mmdelnode it I get: root at ocio-gpu03 renata]# mmdelnode -N dhcp-os-129-164.slac.stanford.edu mmdelnode: Unable to obtain the GPFS configuration file lock. mmdelnode: GPFS was unable to obtain a lock from node dhcp-os-129-164.slac.stanford.edu. mmdelnode: Command failed. Examine previous error messages to determine cause. despite the fact that it doesn't respond to ping. The mmstartup on the newly reinstalled node fails as in my initial email. I should mention that the two "working" nodes are running 4.2.3.4. The person who reinstalled the node that won't start up put on 4.2.3.8. I didn't think that was the cause of this problem though and thought I would try to get the cluster talking again before upgrading the rest of the nodes or degrading the reinstalled one. Thanks, Renata On Wed, 27 Jun 2018, Simon Thompson wrote: >Have you tried running mmsdrestore in the reinstalled node to reads to the cluster and then try and startup gpfs on it? > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1pdg_mmsdrrest.htm > >Simon >________________________________________ >From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Renata Maria Dart [renata at slac.stanford.edu] >Sent: 27 June 2018 19:09 >To: gpfsug-discuss at spectrumscale.org >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving data, > >root at ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine cause. > > >On the reinstalled node, this fails in the same way: > >[root at ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From YARD at il.ibm.com Sun Jul 1 18:17:42 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Sun, 1 Jul 2018 20:17:42 +0300 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events In-Reply-To: References: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> Message-ID: Hi There is was issue with Scale 5.x GUI error - ib_rdma_nic_unrecognized(mlx5_0/2) Check if you have the patch: [root at gssio1 ~]# diff /usr/lpp/mmfs/lib/mmsysmon/NetworkService.py /tmp/NetworkService.py 229c229,230 < recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) --- > #recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) > recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+/\d+\n", mmfsadm)) And restart the - mmsysmoncontrol restart Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Andrew Beattie" To: gpfsug-discuss at spectrumscale.org Date: 06/28/2018 11:16 AM Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Sent by: gpfsug-discuss-bounces at spectrumscale.org Do you know if there is actually a cable plugged into port 2? The system will work fine as long as there is network connectivity, but you may have an issue with redundancy or loss of bandwidth if you do not have every port cabled and configured correctly. Regards Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Date: Thu, Jun 28, 2018 6:08 PM Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root at sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root at sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root at sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root at sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root at sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From oehmes at gmail.com Mon Jul 2 06:26:16 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 2 Jul 2018 07:26:16 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Hi, most traditional raid controllers can't deal well with blocksizes above 4m, which is why the new default is 4m and i would leave it at that unless you know for sure you get better performance with 8mb which typically requires your raid controller volume full block size to be 8mb with maybe a 8+2p @1mb strip size (many people confuse strip size with full track size) . if you don't have dedicated SSDs for metadata i would recommend to just use a 4mb blocksize with mixed data and metadata disks, if you have a reasonable number of SSD's put them in a raid 1 or raid 10 and use them as dedicated metadata and the other disks as dataonly , but i would not use the --metadata-block-size parameter as it prevents the datapool to use large number of subblocks. as long as your SSDs are on raid 1 or 10 there is no read/modify/write penalty, so using them with the 4mb blocksize has no real negative impact at least on controllers i have worked with. hope this helps. On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza wrote: > Hi, it's for a traditional NSD setup. > > --Joey > > On 6/26/18 12:21 AM, Sven Oehme wrote: > > Joseph, > > the subblocksize will be derived from the smallest blocksize in the > filesytem, given you specified a metadata block size of 512k thats what > will be used to calculate the number of subblocks, even your data pool is > 4mb. > is this setup for a traditional NSD Setup or for GNR as the > recommendations would be different. > > sven > > On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza wrote: > >> Quick question, anyone know why GPFS wouldn't respect the default for >> the subblocks-per-full-block parameter when creating a new filesystem? >> I'd expect it to be set to 512 for an 8MB block size but my guess is >> that also specifying a metadata-block-size is interfering with it (by >> being too small). This was a parameter recommended by the vendor for a >> 4.2 installation with metadata on dedicated SSDs in the system pool, any >> best practices for 5.0? I'm guessing I'd have to bump it up to at least >> 4MB to get 512 subblocks for both pools. >> >> fs1 created with: >> # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j >> cluster -n 9000 --metadata-block-size 512K --perfileset-quota >> --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T >> /gpfs/fs1 >> >> # mmlsfs fs1 >> >> >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 8192 Minimum fragment (subblock) >> size in bytes (system pool) >> 131072 Minimum fragment (subblock) >> size in bytes (other pools) >> -i 4096 Inode size in bytes >> -I 32768 Indirect block size in bytes >> >> -B 524288 Block size (system pool) >> 8388608 Block size (other pools) >> >> -V 19.01 (5.0.1.0) File system version >> >> --subblocks-per-full-block 64 Number of subblocks per >> full block >> -P system;DATA Disk storage pools in file >> system >> >> >> Thanks! >> --Joey Mendoza >> NCAR >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mutantllama at gmail.com Mon Jul 2 07:55:07 2018 From: mutantllama at gmail.com (Carl) Date: Mon, 2 Jul 2018 16:55:07 +1000 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Hi Sven, What is the resulting indirect-block size with a 4mb metadata block size? Does the new sub-block magic mean that it will take up 32k, or will it occupy 128k? Cheers, Carl. On Mon, 2 Jul 2018 at 15:26, Sven Oehme wrote: > Hi, > > most traditional raid controllers can't deal well with blocksizes above > 4m, which is why the new default is 4m and i would leave it at that unless > you know for sure you get better performance with 8mb which typically > requires your raid controller volume full block size to be 8mb with maybe a > 8+2p @1mb strip size (many people confuse strip size with full track > size) . > if you don't have dedicated SSDs for metadata i would recommend to just > use a 4mb blocksize with mixed data and metadata disks, if you have a > reasonable number of SSD's put them in a raid 1 or raid 10 and use them as > dedicated metadata and the other disks as dataonly , but i would not use > the --metadata-block-size parameter as it prevents the datapool to use > large number of subblocks. > as long as your SSDs are on raid 1 or 10 there is no read/modify/write > penalty, so using them with the 4mb blocksize has no real negative impact > at least on controllers i have worked with. > > hope this helps. > > On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza wrote: > >> Hi, it's for a traditional NSD setup. >> >> --Joey >> >> On 6/26/18 12:21 AM, Sven Oehme wrote: >> >> Joseph, >> >> the subblocksize will be derived from the smallest blocksize in the >> filesytem, given you specified a metadata block size of 512k thats what >> will be used to calculate the number of subblocks, even your data pool is >> 4mb. >> is this setup for a traditional NSD Setup or for GNR as the >> recommendations would be different. >> >> sven >> >> On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza wrote: >> >>> Quick question, anyone know why GPFS wouldn't respect the default for >>> the subblocks-per-full-block parameter when creating a new filesystem? >>> I'd expect it to be set to 512 for an 8MB block size but my guess is >>> that also specifying a metadata-block-size is interfering with it (by >>> being too small). This was a parameter recommended by the vendor for a >>> 4.2 installation with metadata on dedicated SSDs in the system pool, any >>> best practices for 5.0? I'm guessing I'd have to bump it up to at least >>> 4MB to get 512 subblocks for both pools. >>> >>> fs1 created with: >>> # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j >>> cluster -n 9000 --metadata-block-size 512K --perfileset-quota >>> --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T >>> /gpfs/fs1 >>> >>> # mmlsfs fs1 >>> >>> >>> flag value description >>> ------------------- ------------------------ >>> ----------------------------------- >>> -f 8192 Minimum fragment (subblock) >>> size in bytes (system pool) >>> 131072 Minimum fragment (subblock) >>> size in bytes (other pools) >>> -i 4096 Inode size in bytes >>> -I 32768 Indirect block size in bytes >>> >>> -B 524288 Block size (system pool) >>> 8388608 Block size (other pools) >>> >>> -V 19.01 (5.0.1.0) File system version >>> >>> --subblocks-per-full-block 64 Number of subblocks per >>> full block >>> -P system;DATA Disk storage pools in file >>> system >>> >>> >>> Thanks! >>> --Joey Mendoza >>> NCAR >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Jul 2 08:46:25 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 2 Jul 2018 09:46:25 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Jul 2 08:55:10 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 2 Jul 2018 09:55:10 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Olaf, he is talking about indirect size not subblock size . Carl, here is a screen shot of a 4mb filesystem : [root at p8n15hyp ~]# mmlsfs all_local File system attributes for /dev/fs2-4m-07: ========================================== flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 512 Estimated number of nodes that will mount file system -B 4194304 Block size -Q none Quotas accounting enabled none Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Mon Jun 18 12:30:54 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 4000000000 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d RG001VS001;RG002VS001;RG003VS002;RG004VS002 Disks in file system -A no Automatic mount option -o none Additional mount options -T /gpfs/fs2-4m-07 Default mount point --mount-priority 0 Mount priority as you can see indirect size is 32k sven On Mon, Jul 2, 2018 at 9:46 AM Olaf Weiser wrote: > HI Carl, > 8k for 4 M Blocksize > files < ~3,x KB fits into the inode , for "larger" files (> 3,x KB) at > least one "subblock" be allocated .. > > in R < 5.x ... it was fixed 1/32 from blocksize so subblocksize is > retrieved from the blocksize ... > since R >5 (so new created file systems) .. the new default block size is > 4 MB, fragment size is 8k (512 subblocks) > for even larger block sizes ... more subblocks are available per block > so e.g. > 8M .... 1024 subblocks (fragment size is 8 k again) > > @Sven.. correct me, if I'm wrong ... > > > > > > > From: Carl > > To: gpfsug main discussion list > Date: 07/02/2018 08:55 AM > Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Sven, > > What is the resulting indirect-block size with a 4mb metadata block size? > > Does the new sub-block magic mean that it will take up 32k, or will it > occupy 128k? > > Cheers, > > Carl. > > > On Mon, 2 Jul 2018 at 15:26, Sven Oehme <*oehmes at gmail.com* > > wrote: > Hi, > > most traditional raid controllers can't deal well with blocksizes above > 4m, which is why the new default is 4m and i would leave it at that unless > you know for sure you get better performance with 8mb which typically > requires your raid controller volume full block size to be 8mb with maybe a > 8+2p @1mb strip size (many people confuse strip size with full track size) . > if you don't have dedicated SSDs for metadata i would recommend to just > use a 4mb blocksize with mixed data and metadata disks, if you have a > reasonable number of SSD's put them in a raid 1 or raid 10 and use them as > dedicated metadata and the other disks as dataonly , but i would not use > the --metadata-block-size parameter as it prevents the datapool to use > large number of subblocks. > as long as your SSDs are on raid 1 or 10 there is no read/modify/write > penalty, so using them with the 4mb blocksize has no real negative impact > at least on controllers i have worked with. > > hope this helps. > > On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza <*jam at ucar.edu* > > wrote: > Hi, it's for a traditional NSD setup. > > --Joey > > > On 6/26/18 12:21 AM, Sven Oehme wrote: > Joseph, > > the subblocksize will be derived from the smallest blocksize in the > filesytem, given you specified a metadata block size of 512k thats what > will be used to calculate the number of subblocks, even your data pool is > 4mb. > is this setup for a traditional NSD Setup or for GNR as the > recommendations would be different. > > sven > > On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza <*jam at ucar.edu* > > wrote: > Quick question, anyone know why GPFS wouldn't respect the default for > the subblocks-per-full-block parameter when creating a new filesystem? > I'd expect it to be set to 512 for an 8MB block size but my guess is > that also specifying a metadata-block-size is interfering with it (by > being too small). This was a parameter recommended by the vendor for a > 4.2 installation with metadata on dedicated SSDs in the system pool, any > best practices for 5.0? I'm guessing I'd have to bump it up to at least > 4MB to get 512 subblocks for both pools. > > fs1 created with: > # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j > cluster -n 9000 --metadata-block-size 512K --perfileset-quota > --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 > > # mmlsfs fs1 > > > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment (subblock) > size in bytes (system pool) > 131072 Minimum fragment (subblock) > size in bytes (other pools) > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > > -B 524288 Block size (system pool) > 8388608 Block size (other pools) > > -V 19.01 (5.0.1.0) File system version > > --subblocks-per-full-block 64 Number of subblocks per > full block > -P system;DATA Disk storage pools in file > system > > > Thanks! > --Joey Mendoza > NCAR > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mutantllama at gmail.com Mon Jul 2 10:57:11 2018 From: mutantllama at gmail.com (Carl) Date: Mon, 2 Jul 2018 19:57:11 +1000 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Thanks Olaf and Sven, It looks like a lot of advice from the wiki ( https://www.ibm.com/developerworks/community/wikis/home?lang=en-us#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Data%20and%20Metadata) is no longer relevant for version 5. Any idea if its likely to be updated soon? The new subblock changes appear to have removed a lot of reasons for using smaller block sizes. In broad terms there any situations where you would recommend using less than the new default block size? Cheers, Carl. On Mon, 2 Jul 2018 at 17:55, Sven Oehme wrote: > Olaf, he is talking about indirect size not subblock size . > > Carl, > > here is a screen shot of a 4mb filesystem : > > [root at p8n15hyp ~]# mmlsfs all_local > > File system attributes for /dev/fs2-4m-07: > ========================================== > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment (subblock) > size in bytes > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 1 Default number of metadata > replicas > -M 2 Maximum number of metadata > replicas > -r 1 Default number of data > replicas > -R 2 Maximum number of data > replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in > effect > -k all ACL semantics in effect > -n 512 Estimated number of nodes > that will mount file system > -B 4194304 Block size > -Q none Quotas accounting enabled > none Quotas enforced > none Default quotas enabled > --perfileset-quota No Per-fileset quota enforcement > --filesetdf No Fileset df enabled? > -V 19.01 (5.0.1.0) File system version > --create-time Mon Jun 18 12:30:54 2018 File system creation time > -z No Is DMAPI enabled? > -L 33554432 Logfile size > -E Yes Exact mtime mount option > -S relatime Suppress atime mount option > -K whenpossible Strict replica allocation > option > --fastea Yes Fast external attributes > enabled? > --encryption No Encryption enabled? > --inode-limit 4000000000 Maximum number of inodes > --log-replicas 0 Number of log replicas > --is4KAligned Yes is4KAligned? > --rapid-repair Yes rapidRepair enabled? > --write-cache-threshold 0 HAWC Threshold (max 65536) > --subblocks-per-full-block 512 Number of subblocks per full > block > -P system Disk storage pools in file > system > --file-audit-log No File Audit Logging enabled? > --maintenance-mode No Maintenance Mode enabled? > -d RG001VS001;RG002VS001;RG003VS002;RG004VS002 Disks in > file system > -A no Automatic mount option > -o none Additional mount options > -T /gpfs/fs2-4m-07 Default mount point > --mount-priority 0 Mount priority > > as you can see indirect size is 32k > > sven > > On Mon, Jul 2, 2018 at 9:46 AM Olaf Weiser wrote: > >> HI Carl, >> 8k for 4 M Blocksize >> files < ~3,x KB fits into the inode , for "larger" files (> 3,x KB) at >> least one "subblock" be allocated .. >> >> in R < 5.x ... it was fixed 1/32 from blocksize so subblocksize is >> retrieved from the blocksize ... >> since R >5 (so new created file systems) .. the new default block size is >> 4 MB, fragment size is 8k (512 subblocks) >> for even larger block sizes ... more subblocks are available per block >> so e.g. >> 8M .... 1024 subblocks (fragment size is 8 k again) >> >> @Sven.. correct me, if I'm wrong ... >> >> >> >> >> >> >> From: Carl >> >> To: gpfsug main discussion list >> Date: 07/02/2018 08:55 AM >> Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> ------------------------------ >> >> >> >> Hi Sven, >> >> What is the resulting indirect-block size with a 4mb metadata block size? >> >> Does the new sub-block magic mean that it will take up 32k, or will it >> occupy 128k? >> >> Cheers, >> >> Carl. >> >> >> On Mon, 2 Jul 2018 at 15:26, Sven Oehme <*oehmes at gmail.com* >> > wrote: >> Hi, >> >> most traditional raid controllers can't deal well with blocksizes above >> 4m, which is why the new default is 4m and i would leave it at that unless >> you know for sure you get better performance with 8mb which typically >> requires your raid controller volume full block size to be 8mb with maybe a >> 8+2p @1mb strip size (many people confuse strip size with full track size) . >> if you don't have dedicated SSDs for metadata i would recommend to just >> use a 4mb blocksize with mixed data and metadata disks, if you have a >> reasonable number of SSD's put them in a raid 1 or raid 10 and use them as >> dedicated metadata and the other disks as dataonly , but i would not use >> the --metadata-block-size parameter as it prevents the datapool to use >> large number of subblocks. >> as long as your SSDs are on raid 1 or 10 there is no read/modify/write >> penalty, so using them with the 4mb blocksize has no real negative impact >> at least on controllers i have worked with. >> >> hope this helps. >> >> On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza <*jam at ucar.edu* >> > wrote: >> Hi, it's for a traditional NSD setup. >> >> --Joey >> >> >> On 6/26/18 12:21 AM, Sven Oehme wrote: >> Joseph, >> >> the subblocksize will be derived from the smallest blocksize in the >> filesytem, given you specified a metadata block size of 512k thats what >> will be used to calculate the number of subblocks, even your data pool is >> 4mb. >> is this setup for a traditional NSD Setup or for GNR as the >> recommendations would be different. >> >> sven >> >> On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza <*jam at ucar.edu* >> > wrote: >> Quick question, anyone know why GPFS wouldn't respect the default for >> the subblocks-per-full-block parameter when creating a new filesystem? >> I'd expect it to be set to 512 for an 8MB block size but my guess is >> that also specifying a metadata-block-size is interfering with it (by >> being too small). This was a parameter recommended by the vendor for a >> 4.2 installation with metadata on dedicated SSDs in the system pool, any >> best practices for 5.0? I'm guessing I'd have to bump it up to at least >> 4MB to get 512 subblocks for both pools. >> >> fs1 created with: >> # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j >> cluster -n 9000 --metadata-block-size 512K --perfileset-quota >> --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T >> /gpfs/fs1 >> >> # mmlsfs fs1 >> >> >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 8192 Minimum fragment (subblock) >> size in bytes (system pool) >> 131072 Minimum fragment (subblock) >> size in bytes (other pools) >> -i 4096 Inode size in bytes >> -I 32768 Indirect block size in bytes >> >> -B 524288 Block size (system pool) >> 8388608 Block size (other pools) >> >> -V 19.01 (5.0.1.0) File system version >> >> --subblocks-per-full-block 64 Number of subblocks per >> full block >> -P system;DATA Disk storage pools in file >> system >> >> >> Thanks! >> --Joey Mendoza >> NCAR >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lore at cscs.ch Mon Jul 2 14:50:37 2018 From: lore at cscs.ch (Lo Re Giuseppe) Date: Mon, 2 Jul 2018 13:50:37 +0000 Subject: [gpfsug-discuss] Zimon metrics details Message-ID: <89EC4307-DDE4-42FD-B73A-12F79A3BA22F@cscs.ch> Hi everybody, I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? The SS probelm determination guide doens?t spend more than half a line for each. In particular I would like to understand the difference between these ones: - gpfs_fs_bytes_read - gpfs_fis_bytes_read The second gives tipically higher values than the first one. Thanks for any hit. Regards, Giuseppe *********************************************************************** Giuseppe Lo Re CSCS - Swiss National Supercomputing Center Via Trevano 131 CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 Switzerland Email: giuseppe.lore at cscs.ch *********************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Mon Jul 2 15:04:39 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 2 Jul 2018 07:04:39 -0700 Subject: [gpfsug-discuss] Zimon metrics details In-Reply-To: <89EC4307-DDE4-42FD-B73A-12F79A3BA22F@cscs.ch> References: <89EC4307-DDE4-42FD-B73A-12F79A3BA22F@cscs.ch> Message-ID: <523F9FE0-CA7D-4655-AFC5-BEBC1F56FC34@lbl.gov> +1 Would love to see more detailed descriptions on Zimon metrics. Sent from my iPhone > On Jul 2, 2018, at 6:50 AM, Lo Re Giuseppe wrote: > > Hi everybody, > > I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. > Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) > > Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? > The SS probelm determination guide doens?t spend more than half a line for each. > > In particular I would like to understand the difference between these ones: > > - gpfs_fs_bytes_read > - gpfs_fis_bytes_read > > The second gives tipically higher values than the first one. > > Thanks for any hit. > > Regards, > > Giuseppe > > *********************************************************************** > > Giuseppe Lo Re > > CSCS - Swiss National Supercomputing Center > > Via Trevano 131 > > CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 > > Switzerland Email: giuseppe.lore at cscs.ch > > *********************************************************************** > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From agar at us.ibm.com Mon Jul 2 16:05:33 2018 From: agar at us.ibm.com (Eric Agar) Date: Mon, 2 Jul 2018 11:05:33 -0400 Subject: [gpfsug-discuss] Zimon metrics details In-Reply-To: <523F9FE0-CA7D-4655-AFC5-BEBC1F56FC34@lbl.gov> References: <89EC4307-DDE4-42FD-B73A-12F79A3BA22F@cscs.ch> <523F9FE0-CA7D-4655-AFC5-BEBC1F56FC34@lbl.gov> Message-ID: Hello Giuseppe, Following was my attempt to answer a similar question some months ago. When reading about the different viewpoints of the Zimon sensors, please note that gpfs_fis_bytes_read is a metric provided by the GPFSFileSystemAPI sensor, while gpfs_fs_bytes_read is a metric provided by the GPFSFileSystem sensor. Therefore, gpfs_fis_bytes_read reflects application reads, while gpfs_fs_bytes_read reflects NSD reads. The GPFSFileSystemAPI and GPFSNodeAPI sensor metrics are from the point of view of "applications" in the sense that they provide stats about I/O requests made to files in GPFS file systems from user level applications using POSIX interfaces like open(), close(), read(), write(), etc. This is in contrast to similarly named sensors without the "API" suffix, like GPFSFilesystem and GPFSNode. Those sensors provide stats about I/O requests made by the GPFS code to NSDs (disks) making up GPFS file systems. The relationship between application I/O and disk I/O might or might not be obvious. Consider some examples. An application that starts sequentially reading a file might, at least initially, cause more disk I/O than expected because GPFS has decided to prefetch data. An application write() might not immediately cause the writing of disk blocks, due to the operation of the pagepool. Ultimately, application write()s might cause twice as much data written to disk due to the replication factor of the file system. Application I/O concerns itself with user data; disk I/O might have to occur to handle the user data and associated file system metadata (like inodes and indirect blocks). The difference between GPFSFileSystemAPI and GPFSNodeAPI: GPFSFileSystemAPI reports stats for application I/O per filesystem per node; GPFSNodeAPI reports application I/O stats per node. Similarly, GPFSFilesystem reports stats for disk I/O per filesystem per node; GPFSNode reports disk I/O stats per node. Eric M. Agar agar at us.ibm.com IBM Spectrum Scale Level 2 Software Defined Infrastructure, IBM Systems From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 07/02/2018 10:06 AM Subject: Re: [gpfsug-discuss] Zimon metrics details Sent by: gpfsug-discuss-bounces at spectrumscale.org +1 Would love to see more detailed descriptions on Zimon metrics. Sent from my iPhone On Jul 2, 2018, at 6:50 AM, Lo Re Giuseppe wrote: Hi everybody, I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? The SS probelm determination guide doens?t spend more than half a line for each. In particular I would like to understand the difference between these ones: - gpfs_fs_bytes_read - gpfs_fis_bytes_read The second gives tipically higher values than the first one. Thanks for any hit. Regards, Giuseppe *********************************************************************** Giuseppe Lo Re CSCS - Swiss National Supercomputing Center Via Trevano 131 CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 Switzerland Email: giuseppe.lore at cscs.ch *********************************************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From sandeep.patil at in.ibm.com Mon Jul 2 19:43:20 2018 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Mon, 2 Jul 2018 18:43:20 +0000 Subject: [gpfsug-discuss] Latest Technical Blogs on Spectrum Scale (Q2 2018) In-Reply-To: References: Message-ID: Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q2 2018). We now have over 100+ developer blogs. As discussed in User Groups, passing it along: IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ IBM Spectrum Scale ILM Policies https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/ IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ Management GUI enhancements in IBM Spectrum Scale release 5.0.1 https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/ Managing IBM Spectrum Scale services through GUI https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/ Use AWS CLI with IBM Spectrum Scale? object storage https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/ Hadoop Storage Tiering with IBM Spectrum Scale https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/ How many Files on my Filesystem? https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/ Recording Spectrum Scale Object Stats for Potential Billing like Purpose using Elasticsearch https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/ New features in IBM Elastic Storage Server (ESS) Version 5.3 https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/ Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send earlier) https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19 Redpapers Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/abstracts/redp5411.html?Open SAP HANA and ESS: A Winning Combination (Update) http://www.redbooks.ibm.com/abstracts/redp5436.html?Open Others IBM Spectrum Scale Software Version Recommendation Preventive Service Planning (Updated) http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in HCLS https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN& For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 03/27/2018 05:23 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along: GDPR Compliance and Unstructured Data Storage https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/ IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/ Management GUI enhancements in IBM Spectrum Scale release 5.0.0 https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/ IBM Spectrum Scale 5.0.0 ? What?s new in NFS? https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/ Benefits and implementation of Spectrum Scale sudo wrappers https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/ IBM Spectrum Scale: Big Data and Analytics Solution Brief https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/ Variant Sub-blocks in Spectrum Scale 5.0 https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/ Compression support in Spectrum Scale 5.0.0 https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4 Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY. For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 01/10/2018 12:13 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, Here are list of development blogs in the last quarter. Passing it to this email group as Doris had got a feedback in the UG meetings to notify the members with the latest updates periodically. Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ IBM Spectrum Scale MMFSCK ? Savvy Enhancements https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/ ESS Disk Management https://developer.ibm.com/storage/2018/01/02/ess-disk-management/ IBM Spectrum Scale Object Protocol On Ubuntu https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/ IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/ A Complete Guide to ? Protocol Problem Determination Guide for IBM Spectrum Scale? ? Part 1 https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/ IBM Spectrum Scale installation toolkit ? enhancements over releases https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/ Network requirements in an Elastic Storage Server Setup https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/ Co-resident migration with Transparent cloud tierin https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/ IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/ Big data analytics with Spectrum Scale using remote cluster mount & multi-filesystem support https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/ IBM Spectrum Scale HDFS Transparency Short Circuit Write Support https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/ IBM Spectrum Scale HDFS Transparency Federation Support https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/ How to configure and performance tuning different system workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning database workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Hadoop workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/ How to Configure IBM Spectrum Scale? with NIS based Authentication. https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 11/16/2017 08:15 PM Subject: Latest Technical Blogs on Spectrum Scale Dear User Group members, Here are the Development Blogs in last 3 months on Spectrum Scale Technical Topics. Spectrum Scale Monitoring ? Know More ? https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/ IBM Spectrum Scale 5.0 Release ? What?s coming ! https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/ Four Essentials things to know for managing data ACLs on IBM Spectrum Scale? from Windows https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/ GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server https://developer.ibm.com/storage/2017/11/13/gssutils/ IBM Spectrum Scale Object Authentication https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/ Video Surveillance ? Choosing the right storage https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/ IBM Spectrum scale object deep dive training with problem determination https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training Spectrum Scale as preferred software defined storage for Ubuntu OpenStack https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/ IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a performance workhorse https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/ A Complete Guide to Configure LDAP-based authentication with IBM Spectrum Scale? for File Access https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/ Deploying IBM Spectrum Scale on AWS Quick Start https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/ Monitoring Spectrum Scale Object metrics https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/ Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk Universal https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/ Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol on IBM Spectrum Scale?? https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory and LDAP https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/ IBM Spectrum Scale? Authentication using Active Directory and RFC2307 https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/ High Availability Implementation with IBM Spectrum Virtualize and IBM Spectrum Scale https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/ 10 Frequently asked Questions on configuring Authentication using AD + AUTO ID mapping on IBM Spectrum Scale?. https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/ Five cool things that you didn?t know Transparent Cloud Tiering on Spectrum Scale can do https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/ IBM Spectrum Scale GUI videos https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/ IBM Spectrum Scale? Authentication ? Planning for NFS Access https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Mon Jul 2 21:17:26 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Mon, 2 Jul 2018 22:17:26 +0200 Subject: [gpfsug-discuss] subblock sanity check in 5.0 In-Reply-To: References: <5f87dbf3-d0ab-1ef8-8861-c0c201d405f5@ucar.edu> Message-ID: Hi, Carl, Sven had mentioned the RMW penalty before which could make it beneficial to use smaller blocks. If you have traditional RAIDs and you go the usual route to do track sizes equal to the block size (stripe size = BS/n with n+p RAIDs), you may run into problems if your I/O are typically or very often smaller than a block because the controller needs to read the entire track, modifies it according to your I/O, and writes it back with the parity stripes. Example: with 4MiB BS and 8+2 RAIDS as NSDs, on each I/O smaller than 4MiB reaching an NSD the controller needs to read 4MiB into a buffer, modify it according to your I/O, calculate parity for the whole track and write back 5MiB (8 data stripes of 512kiB plus two parity stripes). In those cases you might be better off with smaller block sizes. In the above scenario, it might however still be ok to leave the block size at 4MiB and just reduce the track size of the RAIDs. One has to check how that affects performance, YMMV I'd say here. Mind that the ESS uses a clever way to mask these type of I/O from the n+p RS based vdisks, but even there one might need to think ... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Carl To: gpfsug main discussion list Date: 02/07/2018 11:57 Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Olaf and Sven, It looks like a lot of advice from the wiki ( https://www.ibm.com/developerworks/community/wikis/home?lang=en-us#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Data%20and%20Metadata ) is no longer relevant for version 5. Any idea if its likely to be updated soon? The new subblock changes appear to have removed a lot of reasons for using smaller block sizes. In broad terms there any situations where you would recommend using less than the new default block size? Cheers, Carl. On Mon, 2 Jul 2018 at 17:55, Sven Oehme wrote: Olaf, he is talking about indirect size not subblock size . Carl, here is a screen shot of a 4mb filesystem : [root at p8n15hyp ~]# mmlsfs all_local File system attributes for /dev/fs2-4m-07: ========================================== flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 512 Estimated number of nodes that will mount file system -B 4194304 Block size -Q none Quotas accounting enabled none Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Mon Jun 18 12:30:54 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 4000000000 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d RG001VS001;RG002VS001;RG003VS002;RG004VS002 Disks in file system -A no Automatic mount option -o none Additional mount options -T /gpfs/fs2-4m-07 Default mount point --mount-priority 0 Mount priority as you can see indirect size is 32k sven On Mon, Jul 2, 2018 at 9:46 AM Olaf Weiser wrote: HI Carl, 8k for 4 M Blocksize files < ~3,x KB fits into the inode , for "larger" files (> 3,x KB) at least one "subblock" be allocated .. in R < 5.x ... it was fixed 1/32 from blocksize so subblocksize is retrieved from the blocksize ... since R >5 (so new created file systems) .. the new default block size is 4 MB, fragment size is 8k (512 subblocks) for even larger block sizes ... more subblocks are available per block so e.g. 8M .... 1024 subblocks (fragment size is 8 k again) @Sven.. correct me, if I'm wrong ... From: Carl To: gpfsug main discussion list Date: 07/02/2018 08:55 AM Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Sven, What is the resulting indirect-block size with a 4mb metadata block size? Does the new sub-block magic mean that it will take up 32k, or will it occupy 128k? Cheers, Carl. On Mon, 2 Jul 2018 at 15:26, Sven Oehme wrote: Hi, most traditional raid controllers can't deal well with blocksizes above 4m, which is why the new default is 4m and i would leave it at that unless you know for sure you get better performance with 8mb which typically requires your raid controller volume full block size to be 8mb with maybe a 8+2p @1mb strip size (many people confuse strip size with full track size) . if you don't have dedicated SSDs for metadata i would recommend to just use a 4mb blocksize with mixed data and metadata disks, if you have a reasonable number of SSD's put them in a raid 1 or raid 10 and use them as dedicated metadata and the other disks as dataonly , but i would not use the --metadata-block-size parameter as it prevents the datapool to use large number of subblocks. as long as your SSDs are on raid 1 or 10 there is no read/modify/write penalty, so using them with the 4mb blocksize has no real negative impact at least on controllers i have worked with. hope this helps. On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza wrote: Hi, it's for a traditional NSD setup. --Joey On 6/26/18 12:21 AM, Sven Oehme wrote: Joseph, the subblocksize will be derived from the smallest blocksize in the filesytem, given you specified a metadata block size of 512k thats what will be used to calculate the number of subblocks, even your data pool is 4mb. is this setup for a traditional NSD Setup or for GNR as the recommendations would be different. sven On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza wrote: Quick question, anyone know why GPFS wouldn't respect the default for the subblocks-per-full-block parameter when creating a new filesystem? I'd expect it to be set to 512 for an 8MB block size but my guess is that also specifying a metadata-block-size is interfering with it (by being too small). This was a parameter recommended by the vendor for a 4.2 installation with metadata on dedicated SSDs in the system pool, any best practices for 5.0? I'm guessing I'd have to bump it up to at least 4MB to get 512 subblocks for both pools. fs1 created with: # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j cluster -n 9000 --metadata-block-size 512K --perfileset-quota --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 # mmlsfs fs1 flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 131072 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -B 524288 Block size (system pool) 8388608 Block size (other pools) -V 19.01 (5.0.1.0) File system version --subblocks-per-full-block 64 Number of subblocks per full block -P system;DATA Disk storage pools in file system Thanks! --Joey Mendoza NCAR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From lore at cscs.ch Tue Jul 3 09:05:41 2018 From: lore at cscs.ch (Lo Re Giuseppe) Date: Tue, 3 Jul 2018 08:05:41 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 78, Issue 6 In-Reply-To: References: Message-ID: Dear Eric, thanks a lot for this information. And what about the gpfs_vfs metric group? What is the difference beteween for example ?gpfs_fis_read_calls" and ?gpfs_vfs_read? ? Again I see the second one being tipically higher than the first one. In addition gpfs_vfs_read is not related to a specific file system... [root at ela5 ~]# mmperfmon query gpfs_fis_read_calls -n1 -b 60 Legend: 1: ela5.cscs.ch|GPFSFilesystemAPI|durand.cscs.ch|store|gpfs_fis_read_calls 2: ela5.cscs.ch|GPFSFilesystemAPI|por.login.cscs.ch|apps|gpfs_fis_read_calls 3: ela5.cscs.ch|GPFSFilesystemAPI|por.login.cscs.ch|project|gpfs_fis_read_calls 4: ela5.cscs.ch|GPFSFilesystemAPI|por.login.cscs.ch|users|gpfs_fis_read_calls Row Timestamp gpfs_fis_read_calls gpfs_fis_read_calls gpfs_fis_read_calls gpfs_fis_read_calls 1 2018-07-03-10:03:00 0 0 7274 0 [root at ela5 ~]# mmperfmon query gpfs_vfs_read -n1 -b 60 Legend: 1: ela5.cscs.ch|GPFSVFS|gpfs_vfs_read Row Timestamp gpfs_vfs_read 1 2018-07-03-10:03:00 45123 Cheers, Giuseppe *********************************************************************** Giuseppe Lo Re CSCS - Swiss National Supercomputing Center Via Trevano 131 CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 Switzerland Email: giuseppe.lore at cscs.ch *********************************************************************** Hello Giuseppe, Following was my attempt to answer a similar question some months ago. When reading about the different viewpoints of the Zimon sensors, please note that gpfs_fis_bytes_read is a metric provided by the GPFSFileSystemAPI sensor, while gpfs_fs_bytes_read is a metric provided by the GPFSFileSystem sensor. Therefore, gpfs_fis_bytes_read reflects application reads, while gpfs_fs_bytes_read reflects NSD reads. The GPFSFileSystemAPI and GPFSNodeAPI sensor metrics are from the point of view of "applications" in the sense that they provide stats about I/O requests made to files in GPFS file systems from user level applications using POSIX interfaces like open(), close(), read(), write(), etc. This is in contrast to similarly named sensors without the "API" suffix, like GPFSFilesystem and GPFSNode. Those sensors provide stats about I/O requests made by the GPFS code to NSDs (disks) making up GPFS file systems. The relationship between application I/O and disk I/O might or might not be obvious. Consider some examples. An application that starts sequentially reading a file might, at least initially, cause more disk I/O than expected because GPFS has decided to prefetch data. An application write() might not immediately cause the writing of disk blocks, due to the operation of the pagepool. Ultimately, application write()s might cause twice as much data written to disk due to the replication factor of the file system. Application I/O concerns itself with user data; disk I/O might have to occur to handle the user data and associated file system metadata (like inodes and indirect blocks). The difference between GPFSFileSystemAPI and GPFSNodeAPI: GPFSFileSystemAPI reports stats for application I/O per filesystem per node; GPFSNodeAPI reports application I/O stats per node. Similarly, GPFSFilesystem reports stats for disk I/O per filesystem per node; GPFSNode reports disk I/O stats per node. Eric M. Agar agar at us.ibm.com IBM Spectrum Scale Level 2 Software Defined Infrastructure, IBM Systems From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 07/02/2018 10:06 AM Subject: Re: [gpfsug-discuss] Zimon metrics details Sent by: gpfsug-discuss-bounces at spectrumscale.org +1 Would love to see more detailed descriptions on Zimon metrics. Sent from my iPhone On Jul 2, 2018, at 6:50 AM, Lo Re Giuseppe wrote: Hi everybody, I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? The SS probelm determination guide doens?t spend more than half a line for each. In particular I would like to understand the difference between these ones: - gpfs_fs_bytes_read - gpfs_fis_bytes_read The second gives tipically higher values than the first one. Thanks for any hit. Regards, Giuseppe *********************************************************************** Giuseppe Lo Re CSCS - Swiss National Supercomputing Center Via Trevano 131 CH-6900 Lugano (TI) Tel: + 41 (0)91 610 8225 Switzerland Email: giuseppe.lore at cscs.ch *********************************************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 78, Issue 6 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: From Cameron.Dunn at bristol.ac.uk Tue Jul 3 12:49:03 2018 From: Cameron.Dunn at bristol.ac.uk (Cameron Dunn) Date: Tue, 3 Jul 2018 11:49:03 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms Message-ID: HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape are then shared by Samba to Macs and PCs. MacOS Finder and Windows Explorer will want to display all the thumbnail images of a folder's contents, which will recall lots of files from tape. According to the Samba documentation this is preventable by setting the following ---------------------------------------------- https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html gpfs:recalls = [ yes | no ] When this option is set to no, an attempt to open an offline file will be rejected with access denied. This helps preventing recall storms triggered by careless applications like Finder and Explorer. yes(default) - Open files that are offline. This will recall the files from HSM. no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. Using this setting also requires gpfs:hsm to be set to yes. gpfs:hsm = [ yes | no ] Enable/Disable announcing if this FS has HSM enabled. no(default) - Do not announce HSM. yes - Announce HSM. -------------------------------------------------- However we could not get this to work. On Centos7/Samba4.5, smb.conf contained gpfs:hsm = yes gpfs:recalls = no (also tried setting gpfs:offline = yes, though this is not documented) We made a share containing image files that were then migrated to tape by LTFS-EE, to see if these flags were respected by OS X Finder or Windows Explorer. Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, so that when browsing the stubs in the share, the files were recalled from tape and the thumbnails displayed. Has anyone seen these flags working as they are supposed to ? Many thanks for any ideas, Cameron Cameron Dunn Advanced Computing Systems Administrator Advanced Computing Research Centre University of Bristol -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Tue Jul 3 20:37:08 2018 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Tue, 3 Jul 2018 19:37:08 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jul 3 17:43:20 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 3 Jul 2018 16:43:20 +0000 Subject: [gpfsug-discuss] High I/O wait times Message-ID: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Jul 3 21:11:17 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 3 Jul 2018 16:11:17 -0400 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: Are you seeing similar values for all the nodes or just some of them? One possible issue is how the NSD queues are configured on the NSD servers. You can see this with the output of "mmfsadm dump nsd". There are queues for LARGE IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the highest pending values to see if many IOs are queueing. There are a couple of options to fix this but rather than explain them I suggest you look for information about NSD queueing on the developerWorks site. There has been information posted there that should prove helpful. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/03/2018 03:49 PM Subject: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jul 3 22:41:17 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 3 Jul 2018 21:41:17 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: <9877F844-6C60-42CD-8902-67C255F3ABD8@vanderbilt.edu> Hi Fred, Thanks for the response. I have been looking at the ?mmfsadm dump nsd? data from the two NSD servers that serve up the two NSDs that most commonly experience high wait times (although, again, this varies from time to time). In addition, I have been reading: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning And: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning Which seem to be the most relevant documents on the Wiki. I would like to do a more detailed analysis of the ?mmfsadm dump nsd? output, but my preliminary looks at it seems to indicate that I see I/O?s queueing in the 50 - 100 range for the small queues and the 60 - 200 range on the large queues. In addition, I am regularly seeing all 12 threads on the LARGE queues active, while it is much more rare that I see all - or even close to all - the threads on the SMALL queues active. As far as the parameters Scott and Yuri mention, on our cluster they are set thusly: [common] nsdMaxWorkerThreads 640 [] nsdMaxWorkerThreads 1024 [common] nsdThreadsPerQueue 4 [] nsdThreadsPerQueue 12 [common] nsdSmallThreadRatio 3 [] nsdSmallThreadRatio 1 So to me it sounds like I need more resources on the LARGE queue side of things ? i.e. it sure doesn?t sound like I want to change my small thread ratio. If I increase the amount of threads it sounds like that might help, but that also takes more pagepool, and I?ve got limited RAM in these (old) NSD servers. I do have nsdbufspace set to 70, but I?ve only got 16-24 GB RAM each in these NSD servers. And a while back I did try increase the page pool on them (very slightly) and ended up causing problems because then they ran out of physical RAM. Thoughts? Followup questions? Thanks! Kevin On Jul 3, 2018, at 3:11 PM, Frederick Stock wrote: Are you seeing similar values for all the nodes or just some of them? One possible issue is how the NSD queues are configured on the NSD servers. You can see this with the output of "mmfsadm dump nsd". There are queues for LARGE IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the highest pending values to see if many IOs are queueing. There are a couple of options to fix this but rather than explain them I suggest you look for information about NSD queueing on the developerWorks site. There has been information posted there that should prove helpful. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/03/2018 03:49 PM Subject: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938076014&sdata=wIyB66HoqvL13I3LX0Ott%2Btr7HQQdInZ028QUp0QMhE%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Jul 3 22:53:19 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 3 Jul 2018 17:53:19 -0400 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <9877F844-6C60-42CD-8902-67C255F3ABD8@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <9877F844-6C60-42CD-8902-67C255F3ABD8@vanderbilt.edu> Message-ID: How many NSDs are served by the NSD servers and what is your maximum file system block size? Have you confirmed that you have sufficient NSD worker threads to handle the maximum number of IOs you are configured to have active? That would be the number of NSDs served times 12 (you have 12 threads per queue). Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/03/2018 05:41 PM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Fred, Thanks for the response. I have been looking at the ?mmfsadm dump nsd? data from the two NSD servers that serve up the two NSDs that most commonly experience high wait times (although, again, this varies from time to time). In addition, I have been reading: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning And: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning Which seem to be the most relevant documents on the Wiki. I would like to do a more detailed analysis of the ?mmfsadm dump nsd? output, but my preliminary looks at it seems to indicate that I see I/O?s queueing in the 50 - 100 range for the small queues and the 60 - 200 range on the large queues. In addition, I am regularly seeing all 12 threads on the LARGE queues active, while it is much more rare that I see all - or even close to all - the threads on the SMALL queues active. As far as the parameters Scott and Yuri mention, on our cluster they are set thusly: [common] nsdMaxWorkerThreads 640 [] nsdMaxWorkerThreads 1024 [common] nsdThreadsPerQueue 4 [] nsdThreadsPerQueue 12 [common] nsdSmallThreadRatio 3 [] nsdSmallThreadRatio 1 So to me it sounds like I need more resources on the LARGE queue side of things ? i.e. it sure doesn?t sound like I want to change my small thread ratio. If I increase the amount of threads it sounds like that might help, but that also takes more pagepool, and I?ve got limited RAM in these (old) NSD servers. I do have nsdbufspace set to 70, but I?ve only got 16-24 GB RAM each in these NSD servers. And a while back I did try increase the page pool on them (very slightly) and ended up causing problems because then they ran out of physical RAM. Thoughts? Followup questions? Thanks! Kevin On Jul 3, 2018, at 3:11 PM, Frederick Stock wrote: Are you seeing similar values for all the nodes or just some of them? One possible issue is how the NSD queues are configured on the NSD servers. You can see this with the output of "mmfsadm dump nsd". There are queues for LARGE IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the highest pending values to see if many IOs are queueing. There are a couple of options to fix this but rather than explain them I suggest you look for information about NSD queueing on the developerWorks site. There has been information posted there that should prove helpful. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/03/2018 03:49 PM Subject: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938076014&sdata=wIyB66HoqvL13I3LX0Ott%2Btr7HQQdInZ028QUp0QMhE%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jul 3 23:05:25 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 3 Jul 2018 22:05:25 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <9877F844-6C60-42CD-8902-67C255F3ABD8@vanderbilt.edu> Message-ID: <2CB5B62E-A40A-4C47-B2D1-137BE87FBDDA@vanderbilt.edu> Hi Fred, I have a total of 48 NSDs served up by 8 NSD servers. 12 of those NSDs are in our small /home filesystem, which is performing just fine. The other 36 are in our ~1 PB /scratch and /data filesystem, which is where the problem is. Our max filesystem block size parameter is set to 16 MB, but the aforementioned filesystem uses a 1 MB block size. nsdMaxWorkerThreads is set to 1024 as shown below. Since each NSD server serves an average of 6 NSDs and 6 x 12 = 72 we?re OK if I?m understanding the calculation correctly. Even multiplying 48 x 12 = 576, so we?re good?!? Your help is much appreciated! Thanks again? Kevin On Jul 3, 2018, at 4:53 PM, Frederick Stock > wrote: How many NSDs are served by the NSD servers and what is your maximum file system block size? Have you confirmed that you have sufficient NSD worker threads to handle the maximum number of IOs you are configured to have active? That would be the number of NSDs served times 12 (you have 12 threads per queue). Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 07/03/2018 05:41 PM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Fred, Thanks for the response. I have been looking at the ?mmfsadm dump nsd? data from the two NSD servers that serve up the two NSDs that most commonly experience high wait times (although, again, this varies from time to time). In addition, I have been reading: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning And: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning Which seem to be the most relevant documents on the Wiki. I would like to do a more detailed analysis of the ?mmfsadm dump nsd? output, but my preliminary looks at it seems to indicate that I see I/O?s queueing in the 50 - 100 range for the small queues and the 60 - 200 range on the large queues. In addition, I am regularly seeing all 12 threads on the LARGE queues active, while it is much more rare that I see all - or even close to all - the threads on the SMALL queues active. As far as the parameters Scott and Yuri mention, on our cluster they are set thusly: [common] nsdMaxWorkerThreads 640 [] nsdMaxWorkerThreads 1024 [common] nsdThreadsPerQueue 4 [] nsdThreadsPerQueue 12 [common] nsdSmallThreadRatio 3 [] nsdSmallThreadRatio 1 So to me it sounds like I need more resources on the LARGE queue side of things ? i.e. it sure doesn?t sound like I want to change my small thread ratio. If I increase the amount of threads it sounds like that might help, but that also takes more pagepool, and I?ve got limited RAM in these (old) NSD servers. I do have nsdbufspace set to 70, but I?ve only got 16-24 GB RAM each in these NSD servers. And a while back I did try increase the page pool on them (very slightly) and ended up causing problems because then they ran out of physical RAM. Thoughts? Followup questions? Thanks! Kevin On Jul 3, 2018, at 3:11 PM, Frederick Stock > wrote: Are you seeing similar values for all the nodes or just some of them? One possible issue is how the NSD queues are configured on the NSD servers. You can see this with the output of "mmfsadm dump nsd". There are queues for LARGE IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the highest pending values to see if many IOs are queueing. There are a couple of options to fix this but rather than explain them I suggest you look for information about NSD queueing on the developerWorks site. There has been information posted there that should prove helpful. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 07/03/2018 03:49 PM Subject: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938076014&sdata=wIyB66HoqvL13I3LX0Ott%2Btr7HQQdInZ028QUp0QMhE%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C7658e1b458b147ad8a3908d5e12f6982%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662516110933587&sdata=RKuWKLRGoBRMSDHkrMsKsuU6JkiFgruK4e7gGafxAGc%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From scrusan at ddn.com Tue Jul 3 23:01:48 2018 From: scrusan at ddn.com (Steve Crusan) Date: Tue, 3 Jul 2018 22:01:48 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: Kevin, While this is happening, are you able to grab latency stats per LUN (hardware vendor agnostic) to see if there are any outliers? Also, when looking at the mmdiag output, are both reads and writes affected? Depending on the storage hardware, your writes might be hitting cache, so maybe this problem is being exasperated by many small reads (that are too random to be coalesced, take advantage of drive NCQ, etc). The other response about the nsd threads is also a good start, but if the I/O waits shift between different NSD servers and across hardware vendors, my assumption would be that you are hitting a bottleneck somewhere, but what you are seeing is symptoms of I/O backlog, which can manifest at any number of places. This could be something as low level as a few slow drives. Have you just started noticing this behavior? Any new applications on your system? Going by your institution, you're probably supposing a wide variety of codes, so if these problems just started happening, its possible that someone changed their code, or decided to run new scientific packages. -Steve ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: Tuesday, July 03, 2018 11:43 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] High I/O wait times Hi all, We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 From taylorm at us.ibm.com Tue Jul 3 23:25:55 2018 From: taylorm at us.ibm.com (Michael L Taylor) Date: Tue, 3 Jul 2018 15:25:55 -0700 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 78, Issue 6 In-Reply-To: References: Message-ID: Hi Giuseppe, The GUI happens to document some of the zimon metrics in the KC here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1hlp_monperfmetrics.htm Hopefully that gets you a bit more of what you need but does not cover everything. Today's Topics: 1. Zimon metrics details (Lo Re Giuseppe) 2. Re: Zimon metrics details (Kristy Kallback-Rose) 3. Re: Zimon metrics details (Eric Agar) From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 07/02/2018 10:06 AM Subject: Re: [gpfsug-discuss] Zimon metrics details Sent by: gpfsug-discuss-bounces at spectrumscale.org +1 Would love to see more detailed descriptions on Zimon metrics. Sent from my iPhone On Jul 2, 2018, at 6:50 AM, Lo Re Giuseppe wrote: Hi everybody, I am extracting the Zimon performance data and uploading them to our elasticsearch cluster. Now that I have the mechanism in place it?s time to understand what I am actually uploading ;) Maybe this has been already asked.. where can I find a (as much as possible) detailed explaination of the different Zimon metrics? The SS probelm determination guide doens?t spend more than half a line for each. In particular I would like to understand the difference between these ones: - gpfs_fs_bytes_read - gpfs_fis_bytes_read The second gives tipically higher values than the first one. Thanks for any hit. Regards, Giuseppe -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Wed Jul 4 06:47:28 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 4 Jul 2018 05:47:28 +0000 Subject: [gpfsug-discuss] Filesystem Operation error Message-ID: <254f2811c2b14c9d8c82403d393d0178@SMXRF105.msg.hukrf.de> Hallo All, follow a short story from yesterday on Version 5.0.1.1. We had a 3 - Node cluster (2 Nodes for IO and the third for a quorum Buster function). A Admin make a mistake an take a delete of the 3 Node (VM). We restored ist with a VM Snapshot no Problem. The only point here we lost complete 7 desconly disk. We defined new one and want to delete this disk with mmdeldisk. On 6 Filesystems no problem but one has now a Problem. We delete this disk finaly with mmdeldisk fsname -p. And we see now after a successfully mmdelnsd the old disk already in following display. mmlsdisk tsmconf -L disk driver sector failure holds holds storage name type size group metadata data status availability disk id pool remarks ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- nsd_tsmconf001_DSK20 nsd 512 0 Yes Yes ready up 1 system desc nsd_g4_tsmconf nsd 512 2 No No removing refs down 2 system nsd_tsmconf001_DSK70 nsd 512 1 Yes Yes ready up 3 system desc nsd_g4_tsmconf1 nsd 512 2 No No ready up 4 system desc After that all fs-cmd geneate a fs operation error here like this. Error=MMFS_SYSTEM_UNMOUNT, ID=0xC954F85D, Tag=3882673: Unrecoverable file system operation error. Status code 65536. Volume tsmconf Questions: 1. What does this mean ?removing refs?. Now we don?t have the possibility to handle these disk. The disk itself is no more existend, but in the stripegroup a referenz is available. nsd_g4_tsmconf: uid 0A885085:577BB637, status ReferencesBeingRemoved, availability Unavailable, created on node 10.136.80.133, Tue Jul 5 15:29:27 2016 type 'nsd', sector size 512, failureConfigVersion 424 quorum weight {0,0}, failure group: id 2, fg index 1 locality group: id 2, lg index 1 failureGroupStrP: (2), rackId 2, locationId 0, extLgId 0 nSectors 528384 (0:81000) (258 MB), inode0Sector 131072 alloc region: no of bits 0, seg num -1, offset 0, len 72 suballocator 0x18015B8A7A4 type 0 nBits 32 subSize 0 dataOffset 4 nRows 0 len/off: storage pool: 0 holds nothing sectors past efficient device boundary: 0 isFenced: 1 start Region No: -1 end Region No:-1 start AllocMap Record: -1 2. Are there any cmd to handle these? 3. Where can I find the Status code 65536? A PMR is also open. Any Hints? Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tees at us.ibm.com Wed Jul 4 03:43:28 2018 From: tees at us.ibm.com (Stephen M Tee) Date: Tue, 3 Jul 2018 21:43:28 -0500 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: You dont state whether your running GPFS or ESS and which level. One thing you can check, is whether the SES and enclosure drivers are being loaded. The lsmod command will show if they are. These drivers were found to cause SCSI IO hangs in Linux RH7.3 and 7.4. If they are being loaded, you can blacklist and unload them with no impact to ESS/GNR By default these drivers are blacklisted in ESS. Stephen Tee ESS Storage Development IBM Systems and Technology Austin, TX 512-963-7177 From: Steve Crusan To: gpfsug main discussion list Date: 07/03/2018 05:08 PM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org Kevin, While this is happening, are you able to grab latency stats per LUN (hardware vendor agnostic) to see if there are any outliers? Also, when looking at the mmdiag output, are both reads and writes affected? Depending on the storage hardware, your writes might be hitting cache, so maybe this problem is being exasperated by many small reads (that are too random to be coalesced, take advantage of drive NCQ, etc). The other response about the nsd threads is also a good start, but if the I/O waits shift between different NSD servers and across hardware vendors, my assumption would be that you are hitting a bottleneck somewhere, but what you are seeing is symptoms of I/O backlog, which can manifest at any number of places. This could be something as low level as a few slow drives. Have you just started noticing this behavior? Any new applications on your system? Going by your institution, you're probably supposing a wide variety of codes, so if these problems just started happening, its possible that someone changed their code, or decided to run new scientific packages. -Steve ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: Tuesday, July 03, 2018 11:43 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] High I/O wait times Hi all, not We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Wed Jul 4 13:34:43 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Wed, 4 Jul 2018 08:34:43 -0400 (EDT) Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: Hi Kevin, Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > Hi all, > We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. ?One of the > confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from > NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. > > In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. ?In our environment, the most common cause has > been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. ?But that?s *not* happening this time. > Is there anything within GPFS / outside of a hardware issue that I should be looking for?? ?Thanks! > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu?- (615)875-9633 > > > > > From Renar.Grunenberg at huk-coburg.de Thu Jul 5 08:02:36 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 5 Jul 2018 07:02:36 +0000 Subject: [gpfsug-discuss] Filesystem Operation error In-Reply-To: <037a7d7f52bf4a6a83406c8c26fa4d82@SMXRF105.msg.hukrf.de> References: <037a7d7f52bf4a6a83406c8c26fa4d82@SMXRF105.msg.hukrf.de> Message-ID: <8fb424ee10404400ac6b81d985dd5bf9@SMXRF105.msg.hukrf.de> Hallo All, we fixed our Problem here with Spectrum Scale Support. The fixing cmd were ?mmcommon recoverfs tsmconf? and ?tsdeldisk tsmconf -d "nsd_g4_tsmconf". The final reason for this problem, if I want to delete a disk in a filesystem all disk must be reachable from the requesting host. In our config the NSD-Server had no NSD-Server Definitions and the Quorum Buster Node had no access to the SAN attached disk. A Recommendation from my site here are: This should be documented for a high available config with a 3 side implementation, or the cmds that want to update the nsd-descriptors for each disk should check are any disk reachable and don?t do a SG-Panic. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Mittwoch, 4. Juli 2018 07:47 An: 'gpfsug-discuss at spectrumscale.org' Betreff: Filesystem Operation error Hallo All, follow a short story from yesterday on Version 5.0.1.1. We had a 3 - Node cluster (2 Nodes for IO and the third for a quorum Buster function). A Admin make a mistake an take a delete of the 3 Node (VM). We restored ist with a VM Snapshot no Problem. The only point here we lost complete 7 desconly disk. We defined new one and want to delete this disk with mmdeldisk. On 6 Filesystems no problem but one has now a Problem. We delete this disk finaly with mmdeldisk fsname -p. And we see now after a successfully mmdelnsd the old disk already in following display. mmlsdisk tsmconf -L disk driver sector failure holds holds storage name type size group metadata data status availability disk id pool remarks ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ --------- nsd_tsmconf001_DSK20 nsd 512 0 Yes Yes ready up 1 system desc nsd_g4_tsmconf nsd 512 2 No No removing refs down 2 system nsd_tsmconf001_DSK70 nsd 512 1 Yes Yes ready up 3 system desc nsd_g4_tsmconf1 nsd 512 2 No No ready up 4 system desc After that all fs-cmd geneate a fs operation error here like this. Error=MMFS_SYSTEM_UNMOUNT, ID=0xC954F85D, Tag=3882673: Unrecoverable file system operation error. Status code 65536. Volume tsmconf Questions: 1. What does this mean ?removing refs?. Now we don?t have the possibility to handle these disk. The disk itself is no more existend, but in the stripegroup a referenz is available. nsd_g4_tsmconf: uid 0A885085:577BB637, status ReferencesBeingRemoved, availability Unavailable, created on node 10.136.80.133, Tue Jul 5 15:29:27 2016 type 'nsd', sector size 512, failureConfigVersion 424 quorum weight {0,0}, failure group: id 2, fg index 1 locality group: id 2, lg index 1 failureGroupStrP: (2), rackId 2, locationId 0, extLgId 0 nSectors 528384 (0:81000) (258 MB), inode0Sector 131072 alloc region: no of bits 0, seg num -1, offset 0, len 72 suballocator 0x18015B8A7A4 type 0 nBits 32 subSize 0 dataOffset 4 nRows 0 len/off: storage pool: 0 holds nothing sectors past efficient device boundary: 0 isFenced: 1 start Region No: -1 end Region No:-1 start AllocMap Record: -1 2. Are there any cmd to handle these? 3. Where can I find the Status code 65536? A PMR is also open. Any Hints? Regards Renar -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Thu Jul 5 09:28:51 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Thu, 5 Jul 2018 08:28:51 +0000 Subject: [gpfsug-discuss] How to get rid of very old mmhealth events In-Reply-To: References: <83A6EEB0EC738F459A39439733AE804526727CB4@MBX114.d.ethz.ch> , Message-ID: <83A6EEB0EC738F459A39439733AE804526729376@MBX114.d.ethz.ch> Hello Daniel, I've solved my problem disabling the check (I've gpfs v4.2.3-5) by putting ib_rdma_enable_monitoring=False in the [network] section of the file /var/mmfs/mmsysmon/mmsysmonitor.conf, and restarting the mmsysmonitor. There was a thread in this group about this problem. A ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Yaron Daniel [YARD at il.ibm.com] Sent: Sunday, July 01, 2018 7:17 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Hi There is was issue with Scale 5.x GUI error - ib_rdma_nic_unrecognized(mlx5_0/2) Check if you have the patch: [root at gssio1 ~]# diff /usr/lpp/mmfs/lib/mmsysmon/NetworkService.py /tmp/NetworkService.py 229c229,230 < recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) --- > #recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) > recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+/\d+\n", mmfsadm)) And restart the - mmsysmoncontrol restart Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_0B5B5F080B5B5954005EFD8BC22582BD] Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_06EDAF6406EDA744005EFD8BC22582BD][cid:_1_06EDB16C06EDA744005EFD8BC22582BD] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: "Andrew Beattie" To: gpfsug-discuss at spectrumscale.org Date: 06/28/2018 11:16 AM Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Do you know if there is actually a cable plugged into port 2? The system will work fine as long as there is network connectivity, but you may have an issue with redundancy or loss of bandwidth if you do not have every port cabled and configured correctly. Regards Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Dorigo Alvise (PSI)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Date: Thu, Jun 28, 2018 6:08 PM Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root at sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root at sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root at sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root at sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root at sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.gif Type: image/gif Size: 1851 bytes Desc: ATT00001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00003.gif Type: image/gif Size: 4376 bytes Desc: ATT00003.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00004.gif Type: image/gif Size: 5093 bytes Desc: ATT00004.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00005.gif Type: image/gif Size: 4746 bytes Desc: ATT00005.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00006.gif Type: image/gif Size: 4557 bytes Desc: ATT00006.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00007.gif Type: image/gif Size: 5093 bytes Desc: ATT00007.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00008.jpg Type: image/jpeg Size: 11294 bytes Desc: ATT00008.jpg URL: From michael.holliday at crick.ac.uk Wed Jul 4 12:37:52 2018 From: michael.holliday at crick.ac.uk (Michael Holliday) Date: Wed, 4 Jul 2018 11:37:52 +0000 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: Hi All, Those commands show no errors not do any of the log files. GPFS has started correctly and showing the cluster and all nodes as up and active. We appear to have found the command that is hanging during the mount - However I'm not sure why its hanging. mmwmi mountedfilesystems Michael From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Yaron Daniel Sent: 20 June 2018 16:36 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Windows Mount Also what does mmdiag --network + mmgetstate -a show ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:image001.gif at 01D41393.D1DEB220] Storage Architect - IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:image004.gif at 01D41393.D1DEB220][cid:image005.gif at 01D41393.D1DEB220] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: "Yaron Daniel" > To: gpfsug main discussion list > Date: 06/20/2018 06:31 PM Subject: Re: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ HI Which Windows OS level - which GPFS FS level , what cygwin version ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:image001.gif at 01D41393.D1DEB220] Storage Architect - IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:image004.gif at 01D41393.D1DEB220][cid:image005.gif at 01D41393.D1DEB220][https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Michael Holliday > To: "gpfsug-discuss at spectrumscale.org" > Date: 06/20/2018 05:49 PM Subject: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, We've being trying to get the windows system to mount GPFS. We've set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing - GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 1851 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.gif Type: image/gif Size: 4376 bytes Desc: image002.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.gif Type: image/gif Size: 5093 bytes Desc: image003.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.gif Type: image/gif Size: 4746 bytes Desc: image004.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.gif Type: image/gif Size: 4557 bytes Desc: image005.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.gif Type: image/gif Size: 5093 bytes Desc: image006.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.jpg Type: image/jpeg Size: 11294 bytes Desc: image007.jpg URL: From heiner.billich at psi.ch Thu Jul 5 17:00:08 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 5 Jul 2018 16:00:08 +0000 Subject: [gpfsug-discuss] -o syncnfs has no effect? Message-ID: Hello, I try to mount a fs with "-o syncnfs" as we'll export it with CES/Protocols. But I never see the mount option displayed when I do # mount | grep fs-name This is a remote cluster mount, we'll run the Protocol nodes in a separate cluster. On the home cluster I see the option 'nfssync' in the output of 'mount'. My conclusion is that the mount option "syncnfs" has no effect on remote cluster mounts. Which seems a bit strange? Please can someone clarify on this? What is the impact on protocol nodes exporting remote cluster mounts? Is there any chance of data corruption? Or are some mount options implicitely inherited from the home cluster? I've read 'syncnfs' is default on Linux, but I would like to know for sure. Funny enough I can pass arbitrary options with # mmmount -o some-garbage which are silently ignored. I did 'mmchfs -o syncnfs' on the home cluster and the syncnfs option is present in /etc/fstab on the remote cluster. I did not remount on all nodes __ Thank you, I'll appreciate any hints or replies. Heiner Versions: Remote cluster 5.0.1 on RHEL7.4 (imounts the fs and runs protocol nodes) Home cluster 4.2.3-8 on RHEL6 (export the fs, owns the storage) Filesystem: 17.00 (4.2.3.0) All Linux x86_64 with Spectrum Scale Standard Edition -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From emanners at fsu.edu Thu Jul 5 19:53:36 2018 From: emanners at fsu.edu (Edson Manners) Date: Thu, 5 Jul 2018 14:53:36 -0400 Subject: [gpfsug-discuss] GPFS GUI Message-ID: <756966bc-5287-abf7-6531-4b249b0687e5@fsu.edu> There was another thread on here about the following error in the GUI: Event name: gui_cluster_down Cause: The GUI calculated that an insufficient amount of quorum nodes is up and running. But it looks like the resolution happened in another channel. I have the exact same problem even though we're running a production GPFS cluster that seems to work perfectly fine. This is the last error in the GUI that I'm trying to get solved. What would be the best way to try to troubleshoot this. -- [Any errors in spelling, tact or fact are transmission errors] - (Stolen from) Dag Wieers Edson Manners Research Computing Center FSU Information Technology Services Dirac Science Library., Room 150G Tallahassee, Florida 32306-4120 From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 6 02:11:17 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 6 Jul 2018 01:11:17 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> Message-ID: <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue. While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week. You?re correct about our mixed workload. There have been no new workloads that I am aware of. Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the servers and storage. We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks. Linux multipathing handles path failures. 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time). So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array. As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output. We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity. Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized. And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB. That has not made a difference. How can I determine how much of the pagepool is actually being used, BTW? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns. The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way. The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now. If you have read this entire very long e-mail, first off, thank you! If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the >> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from >> NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has >> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 From andreas.koeninger at de.ibm.com Fri Jul 6 07:38:07 2018 From: andreas.koeninger at de.ibm.com (Andreas Koeninger) Date: Fri, 6 Jul 2018 06:38:07 +0000 Subject: [gpfsug-discuss] GPFS GUI In-Reply-To: <756966bc-5287-abf7-6531-4b249b0687e5@fsu.edu> References: <756966bc-5287-abf7-6531-4b249b0687e5@fsu.edu> Message-ID: An HTML attachment was scrubbed... URL: From jjdoherty at yahoo.com Fri Jul 6 14:02:38 2018 From: jjdoherty at yahoo.com (Jim Doherty) Date: Fri, 6 Jul 2018 13:02:38 +0000 (UTC) Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> Message-ID: <733478365.61492.1530882158667@mail.yahoo.com> You may want to get an mmtrace,? but I suspect that the disk IOs are slow.???? The iohist is showing the time from when the start IO was issued until it was finished.??? Of course if you have disk IOs taking 10x too long then other IOs are going to queue up behind it.??? If there are more IOs than there are NSD server threads then there are going to be IOs that are queued and waiting for a thread. Jim On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L wrote: Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue.? While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week.? You?re correct about our mixed workload.? There have been no new workloads that I am aware of. Stephen - no, this is not an ESS.? We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN.? Commodity hardware for the servers and storage.? We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks.? Linux multipathing handles path failures.? 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time).? So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array.? As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output.? We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. ??? 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. ??? 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. ??? 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. ??? 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. ??? 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. ??? 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. ??? 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. ??? 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. ??? 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. ??? 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. ??? 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. ??? 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. ??? 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. ??? 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. ??? 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. ??? 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity.? Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized.? And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB.? That has not made a difference.? How can I determine how much of the pagepool is actually being used, BTW?? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns.? The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way.? The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now.? If you have read this entire very long e-mail, first off, thank you!? If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why.? One of the >> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from >> NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related.? In our environment, the most common cause has >> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk.? But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for??? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From emanners at fsu.edu Fri Jul 6 14:05:32 2018 From: emanners at fsu.edu (Edson Manners) Date: Fri, 6 Jul 2018 13:05:32 +0000 Subject: [gpfsug-discuss] GPFS GUI Message-ID: Ok. I'm on 4.2.3-5. So would this bug still show up if my remote filesystem is mounted? Because it is. Thanks. On 7/6/2018 2:38:21 AM, Andreas Koeninger wrote: Which version are you using? There was a bug in 4.2.3.6 and before related to unmounted remote filesystems which could lead to a gui_cluster_down event on the local cluster. Mit freundlichen Gr??en / Kind regards Andreas Koeninger Scrum Master and Software Developer / Spectrum Scale GUI and REST API IBM Systems &Technology Group, Integrated Systems Development / M069 ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49-7034-643-0867 Mobile: +49-7034-643-0867 E-Mail: andreas.koeninger at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Original message ----- From: Edson Manners Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [gpfsug-discuss] GPFS GUI Date: Thu, Jul 5, 2018 11:38 PM There was another thread on here about the following error in the GUI: Event name: gui_cluster_down Cause: The GUI calculated that an insufficient amount of quorum nodes is up and running. But it looks like the resolution happened in another channel. I have the exact same problem even though we're running a production GPFS cluster that seems to work perfectly fine. This is the last error in the GUI that I'm trying to get solved. What would be the best way to try to troubleshoot this. -- [Any errors in spelling, tact or fact are transmission errors] - (Stolen from) Dag Wieers Edson Manners Research Computing Center FSU Information Technology Services Dirac Science Library., Room 150G Tallahassee, Florida 32306-4120 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.koeninger at de.ibm.com Fri Jul 6 14:31:32 2018 From: andreas.koeninger at de.ibm.com (Andreas Koeninger) Date: Fri, 6 Jul 2018 13:31:32 +0000 Subject: [gpfsug-discuss] GPFS GUI In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 6 15:27:51 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 6 Jul 2018 14:27:51 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <733478365.61492.1530882158667@mail.yahoo.com> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> Message-ID: Hi Jim, Thank you for your response. We are taking a two-pronged approach at this point: 1. While I don?t see anything wrong with our storage arrays, I have opened a ticket with the vendor (not IBM) to get them to look at things from that angle. 2. Since the problem moves around from time to time, we are enhancing our monitoring script to see if we can basically go from ?mmdiag ?iohist? to ?clients issuing those I/O requests? to ?jobs running on those clients? to see if there is any commonality there. Thanks again - much appreciated! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 6, 2018, at 8:02 AM, Jim Doherty > wrote: You may want to get an mmtrace, but I suspect that the disk IOs are slow. The iohist is showing the time from when the start IO was issued until it was finished. Of course if you have disk IOs taking 10x too long then other IOs are going to queue up behind it. If there are more IOs than there are NSD server threads then there are going to be IOs that are queued and waiting for a thread. Jim On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L > wrote: Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue. While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week. You?re correct about our mixed workload. There have been no new workloads that I am aware of. Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the servers and storage. We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks. Linux multipathing handles path failures. 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time). So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array. As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output. We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity. Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized. And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB. That has not made a difference. How can I determine how much of the pagepool is actually being used, BTW? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns. The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way. The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now. If you have read this entire very long e-mail, first off, thank you! If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister > wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the >> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from >> NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has >> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C331014fd459d4151432308d5e340c4fa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664789687076842&sdata=UhjNipQdsNjxIcUB%2Ffu2qEwn7K6tIBmGWEIruxGgI4A%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Fri Jul 6 18:13:26 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Fri, 6 Jul 2018 10:13:26 -0700 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> Message-ID: Hi Kevin, This is a bit of a "cargo cult" suggestion but one issue that I have seen is if a disk starts misbehaving a bit but does not fail, it slows down the whole raid group that it is in. And the only way to detect it is to examine the read/write latencies on the individual disks. Does your SAN allow you to do that? That happened to me at least twice in my life and replacing the offending individual disk solved the issue. This was on DDN, so the relevant command were something like 'show pd * counters write_lat' or similar, which showed the latency for the I/Os for each disk. If one disk in the group is an outlier (e.g. 1s write latencies), then the whole raid array (LUN) is just waiting for that one disk. Another possibility for troubleshooting, if you have sufficient free resources: you can just suspend the problematic LUNs in GPFS, as that will remove the write load from them, while still having them service read requests and not affecting users. Regards, Alex On Fri, Jul 6, 2018 at 9:11 AM Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu> wrote: > Hi Jim, > > Thank you for your response. We are taking a two-pronged approach at this > point: > > 1. While I don?t see anything wrong with our storage arrays, I have > opened a ticket with the vendor (not IBM) to get them to look at things > from that angle. > > 2. Since the problem moves around from time to time, we are enhancing our > monitoring script to see if we can basically go from ?mmdiag ?iohist? to > ?clients issuing those I/O requests? to ?jobs running on those clients? to > see if there is any commonality there. > > Thanks again - much appreciated! > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > On Jul 6, 2018, at 8:02 AM, Jim Doherty wrote: > > You may want to get an mmtrace, but I suspect that the disk IOs are > slow. The iohist is showing the time from when the start IO was issued > until it was finished. Of course if you have disk IOs taking 10x too > long then other IOs are going to queue up behind it. If there are more > IOs than there are NSD server threads then there are going to be IOs that > are queued and waiting for a thread. > > Jim > > > On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L < > Kevin.Buterbaugh at Vanderbilt.Edu> wrote: > > > Hi All, > > First off, my apologies for the delay in responding back to the list ? > we?ve actually been working our tails off on this one trying to collect as > much data as we can on what is a very weird issue. While I?m responding to > Aaron?s e-mail, I?m going to try to address the questions raised in all the > responses. > > Steve - this all started last week. You?re correct about our mixed > workload. There have been no new workloads that I am aware of. > > Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. > > Aaron - no, this is not on a DDN, either. > > The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the > servers and storage. We have two SAN ?stacks? and all NSD servers and > storage are connected to both stacks. Linux multipathing handles path > failures. 10 GbE out to the network. > > We first were alerted to this problem by one of our monitoring scripts > which was designed to alert us to abnormally high I/O times, which, as I > mentioned previously, in our environment has usually been caused by cache > battery backup failures in the storage array controllers (but _not_ this > time). So I?m getting e-mails that in part read: > > Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. > Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. > > The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN > on that storage array. As I?ve mentioned, those two LUNs are by far and > away my most frequent problem children, but here?s another report from > today as well: > > Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. > Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. > Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. > Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. > > NSD server hostnames have been changed, BTW, from their real names to nsd1 > - 8. > > Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm > dump nsd? output. We wrote a Python script to pull out what we think is > the most pertinent information: > > nsd1 > 29 SMALL queues, 50 requests pending, 3741 was the highest number of > requests pending. > 348 threads started, 1 threads active, 348 was the highest number of > threads active. > 29 LARGE queues, 0 requests pending, 5694 was the highest number of > requests pending. > 348 threads started, 124 threads active, 348 was the highest number of > threads active. > nsd2 > 29 SMALL queues, 0 requests pending, 1246 was the highest number of > requests pending. > 348 threads started, 13 threads active, 348 was the highest number of > threads active. > 29 LARGE queues, 470 requests pending, 2404 was the highest number of > requests pending. > 348 threads started, 340 threads active, 348 was the highest number of > threads active. > nsd3 > 29 SMALL queues, 108 requests pending, 1796 was the highest number of > requests pending. > 348 threads started, 0 threads active, 348 was the highest number of > threads active. > 29 LARGE queues, 35 requests pending, 3331 was the highest number of > requests pending. > 348 threads started, 4 threads active, 348 was the highest number of > threads active. > nsd4 > 42 SMALL queues, 0 requests pending, 1529 was the highest number of > requests pending. > 504 threads started, 8 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 0 requests pending, 637 was the highest number of > requests pending. > 504 threads started, 211 threads active, 504 was the highest number of > threads active. > nsd5 > 42 SMALL queues, 182 requests pending, 2798 was the highest number of > requests pending. > 504 threads started, 6 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 407 requests pending, 4416 was the highest number of > requests pending. > 504 threads started, 8 threads active, 504 was the highest number of > threads active. > nsd6 > 42 SMALL queues, 0 requests pending, 1630 was the highest number of > requests pending. > 504 threads started, 0 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 0 requests pending, 148 was the highest number of > requests pending. > 504 threads started, 9 threads active, 504 was the highest number of > threads active. > nsd7 > 42 SMALL queues, 43 requests pending, 2179 was the highest number of > requests pending. > 504 threads started, 1 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 0 requests pending, 2551 was the highest number of > requests pending. > 504 threads started, 13 threads active, 504 was the highest number of > threads active. > nsd8 > 42 SMALL queues, 0 requests pending, 1014 was the highest number of > requests pending. > 504 threads started, 4 threads active, 504 was the highest number of > threads active. > 42 LARGE queues, 0 requests pending, 3371 was the highest number of > requests pending. > 504 threads started, 89 threads active, 504 was the highest number of > threads active. > > Note that we see more ?load? on the LARGE queue side of things and that > nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most > frequently in our alerts) are the heaviest loaded. > > One other thing we have noted is that our home grown RRDtool monitoring > plots that are based on netstat, iostat, vmstat, etc. also show an oddity. > Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 > (there are 4 in total) show up as 93 - 97% utilized. And another oddity > there is that eon34A and eon34B rarely show up on the alert e-mails, while > eon34C and eon34E show up waaaayyyyyyy more than anything else ? the > difference between them is that A and B are on the storage array itself and > C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve > actually checked and reseated those connections). > > Another reason why I could not respond earlier today is that one of the > things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 > from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool > on those two boxes to 40 GB. That has not made a difference. How can I > determine how much of the pagepool is actually being used, BTW? A quick > Google search didn?t help me. > > So we?re trying to figure out if we have storage hardware issues causing > GPFS issues or GPFS issues causing storage slowdowns. The fact that I see > slowdowns most often on one storage array points in one direction, while > the fact that at times I see even worse slowdowns on multiple other arrays > points the other way. The fact that some NSD servers show better stats > than others in the analysis of the ?mmfsadm dump nsd? output tells me ? > well, I don?t know what it tells me. > > I think that?s all for now. If you have read this entire very long > e-mail, first off, thank you! If you?ve read it and have ideas for where I > should go from here, T-H-A-N-K Y-O-U! > > Kevin > > > On Jul 4, 2018, at 7:34 AM, Aaron Knister > wrote: > > > > Hi Kevin, > > > > Just going out on a very weird limb here...but you're not by chance > seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. > SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high > latency on some of our SFA12ks (that have otherwise been solid both in > terms of stability and performance) but only on certain volumes and the > affected volumes change. It's very bizzarre and we've been working closely > with DDN to track down the root cause but we've not yet found a smoking > gun. The timing and description of your problem sounded eerily similar to > what we're seeing so I'd thought I'd ask. > > > > -Aaron > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > > > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > > > >> Hi all, > >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some > of our NSDs as reported by ?mmdiag ?iohist" and are struggling to > understand why. One of the > >> confusing things is that, while certain NSDs tend to show the problem > more than others, the problem is not consistent ? i.e. the problem tends to > move around from > >> NSD to NSD (and storage array to storage array) whenever we check ? > which is sometimes just a few minutes apart. > >> In the past when I have seen ?mmdiag ?iohist? report high wait times > like this it has *always* been hardware related. In our environment, the > most common cause has > >> been a battery backup unit on a storage array controller going bad and > the storage array switching to write straight to disk. But that?s *not* > happening this time. > >> Is there anything within GPFS / outside of a hardware issue that I > should be looking for?? Thanks! > >> ? > >> Kevin Buterbaugh - Senior System Administrator > >> Vanderbilt University - Advanced Computing Center for Research and > Education > >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C331014fd459d4151432308d5e340c4fa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664789687076842&sdata=UhjNipQdsNjxIcUB%2Ffu2qEwn7K6tIBmGWEIruxGgI4A%3D&reserved=0 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Jul 6 22:03:09 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 6 Jul 2018 22:03:09 +0100 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> Message-ID: <865c5f52-fa62-571f-aeef-9b1073dfa156@strath.ac.uk> On 06/07/18 02:11, Buterbaugh, Kevin L wrote: [SNIP] > > The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for > the servers and storage. We have two SAN ?stacks? and all NSD > servers and storage are connected to both stacks. Linux multipathing > handles path failures. 10 GbE out to the network. You don't mention it, but have you investigated your FC fabric? Dodgy laser, bad photodiode or damaged fibre can cause havoc. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Kevin.Buterbaugh at Vanderbilt.Edu Sat Jul 7 01:28:06 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Sat, 7 Jul 2018 00:28:06 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> Message-ID: <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> Hi All, Another update on this issue as we have made significant progress today ? but first let me address the two responses I received. Alex - this is a good idea and yes, we did this today. We did see some higher latencies on one storage array as compared to the others. 10-20 ms on the ?good? storage arrays ? 50-60 ms on the one storage array. It took us a while to be able to do this because while the vendor provides a web management interface, that didn?t show this information. But they have an actual app that will ? and the Mac and Linux versions don?t work. So we had to go scrounge up this thing called a Windows PC and get the software installed there. ;-) Jonathan - also a good idea and yes, we also did this today. I?ll explain as part of the rest of this update. The main thing that we did today that has turned out to be most revealing is to take a list of all the NSDs in the impacted storage pool ? 19 devices spread out over 7 storage arrays ? and run read dd tests on all of them (the /dev/dm-XX multipath device). 15 of them showed rates of 33 - 100+ MB/sec and the variation is almost definitely explained by the fact that they?re in production use and getting hit by varying amounts of ?real? work. But 4 of them showed rates of 2-10 MB/sec and those 4 all happen to be on storage array eon34. So, to try to rule out everything but the storage array we replaced the FC cables going from the SAN switches to the array, plugging the new cables into different ports on the SAN switches. Then we repeated the dd tests from a different NSD server, which both eliminated the NSD server and its? FC cables as a potential cause ? and saw results virtually identical to the previous test. Therefore, we feel pretty confident that it is the storage array and have let the vendor know all of this. And there?s another piece of quite possibly relevant info ? the last week in May one of the controllers in this array crashed and rebooted (it?s a active-active dual controller array) ? when that happened the failover occurred ? with a major glitch. One of the LUNs essentially disappeared ? more accurately, it was there, but had no size! We?ve been using this particular vendor for 15 years now and I have seen more than a couple of their controllers go bad during that time and nothing like this had ever happened before. They were never able to adequately explain what happened there. So what I am personally suspecting has happened is that whatever caused that one LUN to go MIA has caused these issues with the other LUNs on the array. As an aside, we ended up using mmfileid to identify the files that had blocks on the MIA LUN and restored those from tape backup. I want to thank everyone who has offered their suggestions so far. I will update the list again once we have a definitive problem determination. I hope that everyone has a great weekend. In the immortal words of the wisest man who ever lived, ?I?m kinda tired ? think I?ll go home now.? ;-) Kevin On Jul 6, 2018, at 12:13 PM, Alex Chekholko > wrote: Hi Kevin, This is a bit of a "cargo cult" suggestion but one issue that I have seen is if a disk starts misbehaving a bit but does not fail, it slows down the whole raid group that it is in. And the only way to detect it is to examine the read/write latencies on the individual disks. Does your SAN allow you to do that? That happened to me at least twice in my life and replacing the offending individual disk solved the issue. This was on DDN, so the relevant command were something like 'show pd * counters write_lat' or similar, which showed the latency for the I/Os for each disk. If one disk in the group is an outlier (e.g. 1s write latencies), then the whole raid array (LUN) is just waiting for that one disk. Another possibility for troubleshooting, if you have sufficient free resources: you can just suspend the problematic LUNs in GPFS, as that will remove the write load from them, while still having them service read requests and not affecting users. Regards, Alex On Fri, Jul 6, 2018 at 9:11 AM Buterbaugh, Kevin L > wrote: Hi Jim, Thank you for your response. We are taking a two-pronged approach at this point: 1. While I don?t see anything wrong with our storage arrays, I have opened a ticket with the vendor (not IBM) to get them to look at things from that angle. 2. Since the problem moves around from time to time, we are enhancing our monitoring script to see if we can basically go from ?mmdiag ?iohist? to ?clients issuing those I/O requests? to ?jobs running on those clients? to see if there is any commonality there. Thanks again - much appreciated! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 6, 2018, at 8:02 AM, Jim Doherty > wrote: You may want to get an mmtrace, but I suspect that the disk IOs are slow. The iohist is showing the time from when the start IO was issued until it was finished. Of course if you have disk IOs taking 10x too long then other IOs are going to queue up behind it. If there are more IOs than there are NSD server threads then there are going to be IOs that are queued and waiting for a thread. Jim On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L > wrote: Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue. While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week. You?re correct about our mixed workload. There have been no new workloads that I am aware of. Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the servers and storage. We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks. Linux multipathing handles path failures. 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time). So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array. As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output. We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity. Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized. And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB. That has not made a difference. How can I determine how much of the pagepool is actually being used, BTW? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns. The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way. The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now. If you have read this entire very long e-mail, first off, thank you! If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister > wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by ?mmdiag ?iohist" and are struggling to understand why. One of the >> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from >> NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times like this it has *always* been hardware related. In our environment, the most common cause has >> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C331014fd459d4151432308d5e340c4fa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664789687076842&sdata=UhjNipQdsNjxIcUB%2Ffu2qEwn7K6tIBmGWEIruxGgI4A%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Caa277914313f445d702e08d5e363d347%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664940252877301&sdata=bnjsWHwutbbKstghBrB5Y7%2FIzeX7U19vroW%2B0xA2gX8%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Sat Jul 7 09:42:57 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Sat, 7 Jul 2018 09:42:57 +0100 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> Message-ID: <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> On 07/07/18 01:28, Buterbaugh, Kevin L wrote: [SNIP] > > So, to try to rule out everything but the storage array we replaced the > FC cables going from the SAN switches to the array, plugging the new > cables into different ports on the SAN switches. ?Then we repeated the > dd tests from a different NSD server, which both eliminated the NSD > server and its? FC cables as a potential cause ? and saw results > virtually identical to the previous test. ?Therefore, we feel pretty > confident that it is the storage array and have let the vendor know all > of this. I was not thinking of doing anything quite as drastic as replacing stuff, more look into the logs on the switches in the FC network and examine them for packet errors. The above testing didn't eliminate bad optics in the storage array itself for example, though it does appear to be the storage arrays themselves. Sounds like they could do with a power cycle... JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Cameron.Dunn at bristol.ac.uk Fri Jul 6 17:36:14 2018 From: Cameron.Dunn at bristol.ac.uk (Cameron Dunn) Date: Fri, 6 Jul 2018 16:36:14 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: , Message-ID: Thanks Christof, we had left out "gpfs" from the vfs objects = line in smb.conf so setting vfs objects = gpfs (etc) gpfs:hsm = yes gpfs:recalls = yes (not "no" as I had originally, and is implied by the manual) and setting the offline flag on the file by migrating it, so that # mmlsattr -L filename.jpg ... Misc attributes: ARCHIVE OFFLINE now Explorer on Windows 7 and 10 do not recall the file while viewing the folder with "Large icons" and a standard icon with an X is displayed. But after the file is then opened and recalled, the icon displays the thumbnail image and the OFFLINE flag is lost. Also as you observed, Finder on MacOSX 10.13 ignores the file's offline flag, so we still risk a recall storm caused by them. All the best, Cameron ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Christof Schmitt Sent: 03 July 2018 20:37:08 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms > HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape > are then shared by Samba to Macs and PCs. > MacOS Finder and Windows Explorer will want to display all the thumbnail images of a > folder's contents, which will recall lots of files from tape. SMB clients can query file information, including the OFFLINE flag. With Spectrum Scale and the "gpfs" module loaded in Samba that is mapped from the the OFFLINE flag that is visible in "mmlsattr -L". In those systems, the SMB client can determine that a file is offline. In our experience this is handled correctly in Windows Explorer; when an "offline" file is encountered, no preview is generated from the file data. The Finder on Mac clients does not seem to honor the OFFLINE flag, thus the main problems are typically recall storms caused by Mac clients. > According to the Samba documentation this is preventable by setting the following > ---------------------------------------------- > https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html > > gpfs:recalls = [ yes | no ] > When this option is set to no, an attempt to open an offline file > will be rejected with access denied. > This helps preventing recall storms triggered by careless applications like Finder and Explorer. > > yes(default) - Open files that are offline. This will recall the files from HSM. > no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. > Using this setting also requires gpfs:hsm to be set to yes. > > gpfs:hsm = [ yes | no ] > Enable/Disable announcing if this FS has HSM enabled. > no(default) - Do not announce HSM. > yes - Announce HSM. > -------------------------------------------------- > > However we could not get this to work. > > On Centos7/Samba4.5, smb.conf contained > gpfs:hsm = yes > gpfs:recalls = no > (also tried setting gpfs:offline = yes, though this is not documented) These options apply to the "gpfs" module in Samba. The Samba version you are using needs to be built with GPFS support and the "gpfs" module needs to be loaded through the "vfs objects" configuration. As Centos7/Samba4.5 is mentioned, would guess that the CentOS provided Samba version is used, which is probably not compiled with GPFS support. >From IBM we would recommend to use CES for protocol services, which also provides Samba for SMB. The Samba provided through CES is configured so that the gpfs:recalls option can be used: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmsmb.htm gpfs:recalls If the value is set as yes files that have been migrated from disk will be recalled on access. By default, this is enabled. If recalls = no files will not be recalled on access and the client will receive ACCESS_DENIED message. > We made a share containing image files that were then migrated to tape by LTFS-EE, > to see if these flags were respected by OS X Finder or Windows Explorer. > > Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, > so that when browsing the stubs in the share, the files were recalled from tape > and the thumbnails displayed. > > Has anyone seen these flags working as they are supposed to ? Yes, they are working, as we use them in our Samba build. Debugging this would require looking at the Samba configuration and possibly collecting a trace. If my above assumption was wrong and this problem occurs with the CES Samba (gpfs.smb), please open a PMR for debugging this issue. If this is not the CES Samba, please contact the provider of the Samba package for additional support. Regards, Christof Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: Cameron Dunn Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] preventing HSM tape recall storms Date: Tue, Jul 3, 2018 6:22 AM HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape are then shared by Samba to Macs and PCs. MacOS Finder and Windows Explorer will want to display all the thumbnail images of a folder's contents, which will recall lots of files from tape. According to the Samba documentation this is preventable by setting the following ---------------------------------------------- https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html gpfs:recalls = [ yes | no ] When this option is set to no, an attempt to open an offline file will be rejected with access denied. This helps preventing recall storms triggered by careless applications like Finder and Explorer. yes(default) - Open files that are offline. This will recall the files from HSM. no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. Using this setting also requires gpfs:hsm to be set to yes. gpfs:hsm = [ yes | no ] Enable/Disable announcing if this FS has HSM enabled. no(default) - Do not announce HSM. yes - Announce HSM. -------------------------------------------------- However we could not get this to work. On Centos7/Samba4.5, smb.conf contained gpfs:hsm = yes gpfs:recalls = no (also tried setting gpfs:offline = yes, though this is not documented) We made a share containing image files that were then migrated to tape by LTFS-EE, to see if these flags were respected by OS X Finder or Windows Explorer. Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, so that when browsing the stubs in the share, the files were recalled from tape and the thumbnails displayed. Has anyone seen these flags working as they are supposed to ? Many thanks for any ideas, Cameron Cameron Dunn Advanced Computing Systems Administrator Advanced Computing Research Centre University of Bristol _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From YARD at il.ibm.com Sun Jul 8 18:32:25 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Sun, 8 Jul 2018 20:32:25 +0300 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu><397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu><733478365.61492.1530882158667@mail.yahoo.com><1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> Message-ID: Hi Clean all counters on the FC switches and see which port have errors . For brocade run : slotstatsclear statsclear porterrshow For cisco run: clear countersall There might be bad gbic/cable/Storage gbic, which can affect the performance, if there is something like that - u can see which ports have errors grow over time. Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Jonathan Buzzard To: gpfsug-discuss at spectrumscale.org Date: 07/07/2018 11:43 AM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org On 07/07/18 01:28, Buterbaugh, Kevin L wrote: [SNIP] > > So, to try to rule out everything but the storage array we replaced the > FC cables going from the SAN switches to the array, plugging the new > cables into different ports on the SAN switches. Then we repeated the > dd tests from a different NSD server, which both eliminated the NSD > server and its? FC cables as a potential cause ? and saw results > virtually identical to the previous test. Therefore, we feel pretty > confident that it is the storage array and have let the vendor know all > of this. I was not thinking of doing anything quite as drastic as replacing stuff, more look into the logs on the switches in the FC network and examine them for packet errors. The above testing didn't eliminate bad optics in the storage array itself for example, though it does appear to be the storage arrays themselves. Sounds like they could do with a power cycle... JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=TM-kJsvzTX9cq_xmR5ITHclBCfO4FDvZ3ZxyugfJCfQ&s=Ass164qVEhb9fC4_VCmzfZeYd_BLOv9cZsfkrzqi8pM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From chris.schlipalius at pawsey.org.au Mon Jul 9 01:36:01 2018 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Mon, 09 Jul 2018 08:36:01 +0800 Subject: [gpfsug-discuss] Upcoming meeting: Australian Spectrum Scale Usergroup 10th August 2018 Sydney Message-ID: <2BD2D9AA-774D-4D6E-A2E6-069E7E91F40E@pawsey.org.au> Dear members, Please note the next Australian Usergroup is confirmed. If you plan to attend, please register: http://bit.ly/2NiNFEQ Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 13 Burvill Court Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10708 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Mon Jul 9 09:51:25 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 9 Jul 2018 08:51:25 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> Message-ID: Did you upgrade the memory etc purely as a "maybe this will help" fix? If so, and it didn't help, I'd be tempted to reduce it again as you may introduce another problem into the environment. I wonder if your disks are about to die, although I suspect you'd have already been forewarned of errors from the disk(s) via the storage system. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 06 July 2018 02:11 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] High I/O wait times Hi All, First off, my apologies for the delay in responding back to the list ? we?ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue. While I?m responding to Aaron?s e-mail, I?m going to try to address the questions raised in all the responses. Steve - this all started last week. You?re correct about our mixed workload. There have been no new workloads that I am aware of. Stephen - no, this is not an ESS. We are running GPFS 4.2.3-8. Aaron - no, this is not on a DDN, either. The hardware setup is a vanilla 8 GB FC SAN. Commodity hardware for the servers and storage. We have two SAN ?stacks? and all NSD servers and storage are connected to both stacks. Linux multipathing handles path failures. 10 GbE out to the network. We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time). So I?m getting e-mails that in part read: Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms. Disk eon34Ensd on nsd4 has a service time of 3146.715 ms. The ?34? tells me what storage array and the ?C? or ?E? tells me what LUN on that storage array. As I?ve mentioned, those two LUNs are by far and away my most frequent problem children, but here?s another report from today as well: Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms. Disk eon28Ansd on nsd7 has a service time of 1154.002 ms. Disk eon31Ansd on nsd3 has a service time of 1068.987 ms. Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms. NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8. Based on Fred?s excellent advice, we took a closer look at the ?mmfsadm dump nsd? output. We wrote a Python script to pull out what we think is the most pertinent information: nsd1 29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending. 348 threads started, 1 threads active, 348 was the highest number of threads active. 29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending. 348 threads started, 124 threads active, 348 was the highest number of threads active. nsd2 29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending. 348 threads started, 13 threads active, 348 was the highest number of threads active. 29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending. 348 threads started, 340 threads active, 348 was the highest number of threads active. nsd3 29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending. 348 threads started, 0 threads active, 348 was the highest number of threads active. 29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending. 348 threads started, 4 threads active, 348 was the highest number of threads active. nsd4 42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending. 504 threads started, 211 threads active, 504 was the highest number of threads active. nsd5 42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending. 504 threads started, 6 threads active, 504 was the highest number of threads active. 42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending. 504 threads started, 8 threads active, 504 was the highest number of threads active. nsd6 42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending. 504 threads started, 0 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending. 504 threads started, 9 threads active, 504 was the highest number of threads active. nsd7 42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending. 504 threads started, 1 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending. 504 threads started, 13 threads active, 504 was the highest number of threads active. nsd8 42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending. 504 threads started, 4 threads active, 504 was the highest number of threads active. 42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending. 504 threads started, 89 threads active, 504 was the highest number of threads active. Note that we see more ?load? on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded. One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity. Most of our LUNs show up as 33 - 68% utilized ? but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized. And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else ? the difference between them is that A and B are on the storage array itself and C and E are on JBOD?s SAS-attached to the storage array (and yes, we?ve actually checked and reseated those connections). Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each ? and I then upped the pagepool on those two boxes to 40 GB. That has not made a difference. How can I determine how much of the pagepool is actually being used, BTW? A quick Google search didn?t help me. So we?re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns. The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way. The fact that some NSD servers show better stats than others in the analysis of the ?mmfsadm dump nsd? output tells me ? well, I don?t know what it tells me. I think that?s all for now. If you have read this entire very long e-mail, first off, thank you! If you?ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U! Kevin > On Jul 4, 2018, at 7:34 AM, Aaron Knister wrote: > > Hi Kevin, > > Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight > Center > (301) 286-2776 > > > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote: > >> Hi all, >> We are experiencing some high I/O wait times (5 - 20 seconds!) on >> some of our NSDs as reported by ?mmdiag ?iohist" and are struggling >> to understand why. One of the confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent ? i.e. the problem tends to move around from NSD to NSD (and storage array to storage array) whenever we check ? which is sometimes just a few minutes apart. >> In the past when I have seen ?mmdiag ?iohist? report high wait times >> like this it has *always* been hardware related. In our environment, the most common cause has been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk. But that?s *not* happening this time. >> Is there anything within GPFS / outside of a hardware issue that I should be looking for?? Thanks! >> ? >> Kevin Buterbaugh - Senior System Administrator Vanderbilt University >> - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug > .org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterb > augh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3b > e4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D% > 2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From scale at us.ibm.com Mon Jul 9 17:57:18 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 9 Jul 2018 09:57:18 -0700 Subject: [gpfsug-discuss] GPFS Windows Mount In-Reply-To: References: Message-ID: Hello, Can you provide the Windows OS and GPFS versions. Does the mmmount hang indefinitely or for a finite time (like 30 seconds or so)? Do you see any GPFS waiters during the mmmount hang? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michael Holliday To: gpfsug main discussion list Date: 07/05/2018 08:12 AM Subject: Re: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Those commands show no errors not do any of the log files. GPFS has started correctly and showing the cluster and all nodes as up and active. We appear to have found the command that is hanging during the mount - However I?m not sure why its hanging. mmwmi mountedfilesystems Michael From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Yaron Daniel Sent: 20 June 2018 16:36 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Windows Mount Also what does mmdiag --network + mmgetstate -a show ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Yaron Daniel" To: gpfsug main discussion list Date: 06/20/2018 06:31 PM Subject: Re: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org HI Which Windows OS level - which GPFS FS level , what cygwin version ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Michael Holliday To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 06/20/2018 05:49 PM Subject: [gpfsug-discuss] GPFS Windows Mount Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, We?ve being trying to get the windows system to mount GPFS. We?ve set the drive letter on the files system, and we can get the system added to the GPFS cluster and showing as active. When we try to mount the file system the system just sits and does nothing ? GPFS shows no errors or issues, there are no problems in the log files. The firewalls are stopped and as far as we can tell it should work. Does anyone have any experience with the GPFS windows client that may help us? Michael Michael Holliday RITTech MBCS Senior HPC & Research Data Systems Engineer | eMedLab Operations Team Scientific Computing | IT&S | The Francis Crick Institute 1, Midland Road | London | NW1 1AT | United Kingdom Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From christof.schmitt at us.ibm.com Mon Jul 9 19:53:36 2018 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Mon, 9 Jul 2018 18:53:36 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: , , Message-ID: An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Jul 9 19:57:38 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 9 Jul 2018 14:57:38 -0400 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: , , Message-ID: Another option is to request Apple to support the OFFLINE flag in the SMB protocol. The more Mac customers making such a request (I have asked others to do likewise) might convince Apple to add this checking to their SMB client. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Christof Schmitt" To: gpfsug-discuss at spectrumscale.org Date: 07/09/2018 02:53 PM Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms Sent by: gpfsug-discuss-bounces at spectrumscale.org > we had left out "gpfs" from the > vfs objects = > line in smb.conf > > so setting > vfs objects = gpfs (etc) > gpfs:hsm = yes > gpfs:recalls = yes (not "no" as I had originally, and is implied by the manual) Thank you for the update. gpfs:recalls=yes is the default, allowing recalls of files. If you set that to 'no', Samba will deny access to "OFFLINE" files in GPFS through SMB. > and setting the offline flag on the file by migrating it, so that > # mmlsattr -L filename.jpg > ... > Misc attributes: ARCHIVE OFFLINE > > now Explorer on Windows 7 and 10 do not recall the file while viewing the folder with "Large icons" > > and a standard icon with an X is displayed. > > But after the file is then opened and recalled, the icon displays the thumbnail image and the OFFLINE flag is lost. Yes, that is working as intended. While the file is only in the "external pool" (e.g. HSM tape), the OFFLINE flag is reported. Once you read/write data, that triggers a recall to the disk pool and the flag is cleared. > Also as you observed, Finder on MacOSX 10.13 ignores the file's offline flag, > > so we still risk a recall storm caused by them. The question here would be how to handle the Mac clients. You could configured two SMB shares on the same path: One with gpfs:recalls=yes and tell the Windows users to access that share; the other one with gpfs:recalls=no and tell the Mac users to use that share. That would avoid the recall storms, but runs the risk of Mac users connecting to the wrong share and avoiding this workaround... Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: Cameron Dunn Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms Date: Sat, Jul 7, 2018 2:30 PM Thanks Christof, we had left out "gpfs" from the vfs objects = line in smb.conf so setting vfs objects = gpfs (etc) gpfs:hsm = yes gpfs:recalls = yes (not "no" as I had originally, and is implied by the manual) and setting the offline flag on the file by migrating it, so that # mmlsattr -L filename.jpg ... Misc attributes: ARCHIVE OFFLINE now Explorer on Windows 7 and 10 do not recall the file while viewing the folder with "Large icons" and a standard icon with an X is displayed. But after the file is then opened and recalled, the icon displays the thumbnail image and the OFFLINE flag is lost. Also as you observed, Finder on MacOSX 10.13 ignores the file's offline flag, so we still risk a recall storm caused by them. All the best, Cameron From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Christof Schmitt Sent: 03 July 2018 20:37:08 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms > HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape > are then shared by Samba to Macs and PCs. > MacOS Finder and Windows Explorer will want to display all the thumbnail images of a > folder's contents, which will recall lots of files from tape. SMB clients can query file information, including the OFFLINE flag. With Spectrum Scale and the "gpfs" module loaded in Samba that is mapped from the the OFFLINE flag that is visible in "mmlsattr -L". In those systems, the SMB client can determine that a file is offline. In our experience this is handled correctly in Windows Explorer; when an "offline" file is encountered, no preview is generated from the file data. The Finder on Mac clients does not seem to honor the OFFLINE flag, thus the main problems are typically recall storms caused by Mac clients. > According to the Samba documentation this is preventable by setting the following > ---------------------------------------------- > https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html > > gpfs:recalls = [ yes | no ] > When this option is set to no, an attempt to open an offline file > will be rejected with access denied. > This helps preventing recall storms triggered by careless applications like Finder and Explorer. > > yes(default) - Open files that are offline. This will recall the files from HSM. > no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. > Using this setting also requires gpfs:hsm to be set to yes. > > gpfs:hsm = [ yes | no ] > Enable/Disable announcing if this FS has HSM enabled. > no(default) - Do not announce HSM. > yes - Announce HSM. > -------------------------------------------------- > > However we could not get this to work. > > On Centos7/Samba4.5, smb.conf contained > gpfs:hsm = yes > gpfs:recalls = no > (also tried setting gpfs:offline = yes, though this is not documented) These options apply to the "gpfs" module in Samba. The Samba version you are using needs to be built with GPFS support and the "gpfs" module needs to be loaded through the "vfs objects" configuration. As Centos7/Samba4.5 is mentioned, would guess that the CentOS provided Samba version is used, which is probably not compiled with GPFS support. >From IBM we would recommend to use CES for protocol services, which also provides Samba for SMB. The Samba provided through CES is configured so that the gpfs:recalls option can be used: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmsmb.htm gpfs:recalls If the value is set as yes files that have been migrated from disk will be recalled on access. By default, this is enabled. If recalls = no files will not be recalled on access and the client will receive ACCESS_DENIED message. > We made a share containing image files that were then migrated to tape by LTFS-EE, > to see if these flags were respected by OS X Finder or Windows Explorer. > > Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, > so that when browsing the stubs in the share, the files were recalled from tape > and the thumbnails displayed. > > Has anyone seen these flags working as they are supposed to ? Yes, they are working, as we use them in our Samba build. Debugging this would require looking at the Samba configuration and possibly collecting a trace. If my above assumption was wrong and this problem occurs with the CES Samba (gpfs.smb), please open a PMR for debugging this issue. If this is not the CES Samba, please contact the provider of the Samba package for additional support. Regards, Christof Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: Cameron Dunn Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] preventing HSM tape recall storms Date: Tue, Jul 3, 2018 6:22 AM HSM over LTFS-EE runs the risk of a recall storm if files which have been migrated to tape are then shared by Samba to Macs and PCs. MacOS Finder and Windows Explorer will want to display all the thumbnail images of a folder's contents, which will recall lots of files from tape. According to the Samba documentation this is preventable by setting the following ---------------------------------------------- https://www.samba.org/samba/docs/current/man-html/vfs_gpfs.8.html gpfs:recalls = [ yes | no ] When this option is set to no, an attempt to open an offline file will be rejected with access denied. This helps preventing recall storms triggered by careless applications like Finder and Explorer. yes(default) - Open files that are offline. This will recall the files from HSM. no - Reject access to offline files with access denied. This will prevent recalls of files from HSM. Using this setting also requires gpfs:hsm to be set to yes. gpfs:hsm = [ yes | no ] Enable/Disable announcing if this FS has HSM enabled. no(default) - Do not announce HSM. yes - Announce HSM. -------------------------------------------------- However we could not get this to work. On Centos7/Samba4.5, smb.conf contained gpfs:hsm = yes gpfs:recalls = no (also tried setting gpfs:offline = yes, though this is not documented) We made a share containing image files that were then migrated to tape by LTFS-EE, to see if these flags were respected by OS X Finder or Windows Explorer. Neither Mac OS X (using SMB3) or Windows 7 (using SMB2) respected the settings, so that when browsing the stubs in the share, the files were recalled from tape and the thumbnails displayed. Has anyone seen these flags working as they are supposed to ? Many thanks for any ideas, Cameron Cameron Dunn Advanced Computing Systems Administrator Advanced Computing Research Centre University of Bristol _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 9 20:31:32 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 9 Jul 2018 19:31:32 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? Message-ID: <4DFFEEC3-3AF6-4BAB-8D6C-7C6413469A44@vanderbilt.edu> Hi All, I am still working on my issue of the occasional high I/O wait times and that has raised another question ? I know that I can run mmfileid to see what files have a block on a given NSD, but is there a way to do the opposite? I.e. I want to know what NSDs a single file has its? blocks on? The mmlsattr command does not appear to show this information unless it?s got an undocumented option. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jul 9 21:21:29 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 9 Jul 2018 20:21:29 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? Message-ID: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> I don?t think you can do it directly, but you could probably use FileHeat to figure it out indirectly. Look at mmchconfig on how to set these: fileHeatLossPercent 20 fileHeatPeriodMinutes 1440 Then you can run a fairly simple policy scan to dump out the file names and heat value, sort what?s the most active to the top. I?ve done this, and it can prove helpful: define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) END]) rule fh1 external list 'fh' exec '' rule fh2 list 'fh' weight(FILE_HEAT) show( DISPLAY_NULL(FILE_HEAT) || '|' || varchar(file_size) ) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Monday, July 9, 2018 at 3:04 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] What NSDs does a file have blocks on? Hi All, I am still working on my issue of the occasional high I/O wait times and that has raised another question ? I know that I can run mmfileid to see what files have a block on a given NSD, but is there a way to do the opposite? I.e. I want to know what NSDs a single file has its? blocks on? The mmlsattr command does not appear to show this information unless it?s got an undocumented option. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Mon Jul 9 21:51:34 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Mon, 9 Jul 2018 16:51:34 -0400 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: <4DFFEEC3-3AF6-4BAB-8D6C-7C6413469A44@vanderbilt.edu> References: <4DFFEEC3-3AF6-4BAB-8D6C-7C6413469A44@vanderbilt.edu> Message-ID: Hi Kevin, >>I want to know what NSDs a single file has its? blocks on? You may use /usr/lpp/mmfs/samples/fpo/mmgetlocationto obtain the file-to-NSD block layout map. Use the -h option for this tools usage ( mmgetlocation -h). Sample output is below: # File-system block size is 4MiB and sample file is 40MiB. # ls -lh /mnt/gpfs3a/data_out/lf -rw-r--r-- 1 root root 40M Jul 9 16:42 /mnt/gpfs3a/data_out/lf # du -sh /mnt/gpfs3a/data_out/lf 40M /mnt/gpfs3a/data_out/lf # mmlsfs gpfs3a | grep 'Block size' -B 4194304 Block size # The file data is striped across 10 x NSDs (DMD_NSDX) constituting the file-system # /usr/lpp/mmfs/samples/fpo/mmgetlocation -f /mnt/gpfs3a/data_out/lf [FILE /mnt/gpfs3a/data_out/lf INFORMATION] FS_DATA_BLOCKSIZE : 4194304 (bytes) FS_META_DATA_BLOCKSIZE : 4194304 (bytes) FS_FILE_DATAREPLICA : 1 FS_FILE_METADATAREPLICA : 1 FS_FILE_STORAGEPOOLNAME : system FS_FILE_ALLOWWRITEAFFINITY : no FS_FILE_WRITEAFFINITYDEPTH : 0 FS_FILE_BLOCKGROUPFACTOR : 1 chunk(s)# 0 (offset 0) : [DMD_NSD5 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 1 (offset 4194304) : [DMD_NSD6 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 2 (offset 8388608) : [DMD_NSD7 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 3 (offset 12582912) : [DMD_NSD8 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 4 (offset 16777216) : [DMD_NSD9 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 5 (offset 20971520) : [DMD_NSD10 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 6 (offset 25165824) : [DMD_NSD1 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 7 (offset 29360128) : [DMD_NSD2 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 8 (offset 33554432) : [DMD_NSD3 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 9 (offset 37748736) : [DMD_NSD4 c72f1m5u39ib0,c72f1m5u37ib0] [FILE: /mnt/gpfs3a/data_out/lf SUMMARY INFO] replica1: c72f1m5u37ib0,c72f1m5u39ib0: 5 chunk(s) c72f1m5u39ib0,c72f1m5u37ib0: 5 chunk(s) Thanks and Regards, -Kums From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/09/2018 04:05 PM Subject: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I am still working on my issue of the occasional high I/O wait times and that has raised another question ? I know that I can run mmfileid to see what files have a block on a given NSD, but is there a way to do the opposite? I.e. I want to know what NSDs a single file has its? blocks on? The mmlsattr command does not appear to show this information unless it?s got an undocumented option. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Jul 9 22:04:15 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 9 Jul 2018 17:04:15 -0400 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> References: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> Message-ID: (psss... ) tsdbfs Not responsible for anything bad that happens...! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 9 22:03:21 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 9 Jul 2018 21:03:21 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: References: <4DFFEEC3-3AF6-4BAB-8D6C-7C6413469A44@vanderbilt.edu> Message-ID: <7D0DA547-4C19-4AE8-AFF8-BB0FBBF487AA@vanderbilt.edu> Hi Kums, Thanks so much ? this gave me exactly what I was looking for and the output was what I suspected I would see. Unfortunately, that means that the mystery of why we?re having these occasional high I/O wait times persists, but oh well? Kevin On Jul 9, 2018, at 3:51 PM, Kumaran Rajaram > wrote: Hi Kevin, >>I want to know what NSDs a single file has its? blocks on? You may use /usr/lpp/mmfs/samples/fpo/mmgetlocationto obtain the file-to-NSD block layout map. Use the -h option for this tools usage (mmgetlocation -h). Sample output is below: # File-system block size is 4MiB and sample file is 40MiB. # ls -lh /mnt/gpfs3a/data_out/lf -rw-r--r-- 1 root root 40M Jul 9 16:42 /mnt/gpfs3a/data_out/lf # du -sh /mnt/gpfs3a/data_out/lf 40M /mnt/gpfs3a/data_out/lf # mmlsfs gpfs3a | grep 'Block size' -B 4194304 Block size # The file data is striped across 10 x NSDs (DMD_NSDX) constituting the file-system # /usr/lpp/mmfs/samples/fpo/mmgetlocation -f /mnt/gpfs3a/data_out/lf [FILE /mnt/gpfs3a/data_out/lf INFORMATION] FS_DATA_BLOCKSIZE : 4194304 (bytes) FS_META_DATA_BLOCKSIZE : 4194304 (bytes) FS_FILE_DATAREPLICA : 1 FS_FILE_METADATAREPLICA : 1 FS_FILE_STORAGEPOOLNAME : system FS_FILE_ALLOWWRITEAFFINITY : no FS_FILE_WRITEAFFINITYDEPTH : 0 FS_FILE_BLOCKGROUPFACTOR : 1 chunk(s)# 0 (offset 0) : [DMD_NSD5 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 1 (offset 4194304) : [DMD_NSD6 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 2 (offset 8388608) : [DMD_NSD7 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 3 (offset 12582912) : [DMD_NSD8 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 4 (offset 16777216) : [DMD_NSD9 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 5 (offset 20971520) : [DMD_NSD10 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 6 (offset 25165824) : [DMD_NSD1 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 7 (offset 29360128) : [DMD_NSD2 c72f1m5u39ib0,c72f1m5u37ib0] chunk(s)# 8 (offset 33554432) : [DMD_NSD3 c72f1m5u37ib0,c72f1m5u39ib0] chunk(s)# 9 (offset 37748736) : [DMD_NSD4 c72f1m5u39ib0,c72f1m5u37ib0] [FILE: /mnt/gpfs3a/data_out/lf SUMMARY INFO] replica1: c72f1m5u37ib0,c72f1m5u39ib0: 5 chunk(s) c72f1m5u39ib0,c72f1m5u37ib0: 5 chunk(s) Thanks and Regards, -Kums From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 07/09/2018 04:05 PM Subject: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I am still working on my issue of the occasional high I/O wait times and that has raised another question ? I know that I can run mmfileid to see what files have a block on a given NSD, but is there a way to do the opposite? I.e. I want to know what NSDs a single file has its? blocks on? The mmlsattr command does not appear to show this information unless it?s got an undocumented option. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C523052f2a40c48efb5a808d5e5ddc6b0%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636667663044944884&sdata=Q2Wg8yDwA9yu%2FZgJXELr7V3qHAY7I7eKPTBHkqVKA5I%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Jul 9 22:21:41 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 9 Jul 2018 21:21:41 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: References: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> Message-ID: <27572009-ACD1-4317-A335-301D42E99BDE@bham.ac.uk> I was going to say something like that ? e.g. blockaddr 563148261 Inode 563148261 snap 0 offset 0 N=2 1:45255923200 13:59403784320 1: and 13: in the output are the NSD disk devices for inode 563148261 Simon From: on behalf of "makaplan at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Monday, 9 July 2018 at 22:04 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? (psss... ) tsdbfs Not responsible for anything bad that happens...! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 9 22:44:07 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 9 Jul 2018 21:44:07 +0000 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: <20180708174441.EE5BB17B422@gpfsug.org> References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> <20180708174441.EE5BB17B422@gpfsug.org> Message-ID: Hi All, Time for a daily update on this saga? First off, responses to those who have responded to me: Yaron - we have QLogic switches, but I?ll RTFM and figure out how to clear the counters ? with a quick look via the CLI interface to one of them I don?t see how to even look at those counters, must less clear them, but I?ll do some digging. QLogic does have a GUI app, but given that the Mac version is PowerPC only I think that?s a dead end! :-O Jonathan - understood. We were just wanting to eliminate as much hardware as potential culprits as we could. The storage arrays will all get a power-cycle this Sunday when we take a downtime to do firmware upgrades on them ? the vendor is basically refusing to assist further until we get on the latest firmware. So ? we had noticed that things seem to calm down starting Friday evening and continuing throughout the weekend. We have a script that runs every half hour and if there?s any NSD servers where ?mmdiag ?iohist? shows an I/O > 1,000 ms, we get an alert (again, designed to alert us of a CBM failure). We only got three all weekend long (as opposed to last week, when the alerts were coming every half hour round the clock). Then, this morning I repeated the ?dd? test that I had run before and after replacing the FC cables going to ?eon34? and which had showed very typical I/O rates for all the NSDs except for the 4 in eon34, which were quite poor (~1.5 - 10 MB/sec). I ran the new tests this morning from different NSD servers and with a higher ?count? passed to dd to eliminate any potential caching effects. I ran the test twice from two different NSD servers and this morning all NSDs - including those on eon34 - showed normal I/O rates! Argh - so do we have a hardware problem or not?!? I still think we do, but am taking *nothing* for granted at this point! So today we also used another script we?ve written to do some investigation ? basically we took the script which runs ?mmdiag ?iohist? and added some options to it so that for every I/O greater than the threshold it will see which client issued the I/O. It then queries SLURM to see what jobs are running on that client. Interestingly enough, one user showed up waaaayyyyyy more often than anybody else. And many times she was on a node with only one other user who we know doesn?t access the GPFS filesystem and other times she was the only user on the node. We certainly recognize that correlation is not causation (she could be a victim and not the culprit), but she was on so many of the reported clients that we decided to investigate further ? but her jobs seem to have fairly modest I/O requirements. Each one processes 4 input files, which are basically just gzip?d text files of 1.5 - 5 GB in size. This is what, however, prompted my other query to the list about determining which NSDs a given file has its? blocks on. I couldn?t see how files of that size could have all their blocks on only a couple of NSDs in the pool (out of 19 total!) but wanted to verify that. The files that I have looked at are evenly spread out across the NSDs. So given that her files are spread across all 19 NSDs in the pool and the high I/O wait times are almost always only on LUNs in eon34 (and, more specifically, on two of the four LUNs in eon34) I?m pretty well convinced it?s not her jobs causing the problems ? I?m back to thinking a weird hardware issue. But if anyone wants to try to convince me otherwise, I?ll listen? Thanks! Kevin On Jul 8, 2018, at 12:32 PM, Yaron Daniel > wrote: Hi Clean all counters on the FC switches and see which port have errors . For brocade run : slotstatsclear statsclear porterrshow For cisco run: clear countersall There might be bad gbic/cable/Storage gbic, which can affect the performance, if there is something like that - u can see which ports have errors grow over time. Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Jonathan Buzzard > To: gpfsug-discuss at spectrumscale.org Date: 07/07/2018 11:43 AM Subject: Re: [gpfsug-discuss] High I/O wait times Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ On 07/07/18 01:28, Buterbaugh, Kevin L wrote: [SNIP] > > So, to try to rule out everything but the storage array we replaced the > FC cables going from the SAN switches to the array, plugging the new > cables into different ports on the SAN switches. Then we repeated the > dd tests from a different NSD server, which both eliminated the NSD > server and its? FC cables as a potential cause ? and saw results > virtually identical to the previous test. Therefore, we feel pretty > confident that it is the storage array and have let the vendor know all > of this. I was not thinking of doing anything quite as drastic as replacing stuff, more look into the logs on the switches in the FC network and examine them for packet errors. The above testing didn't eliminate bad optics in the storage array itself for example, though it does appear to be the storage arrays themselves. Sounds like they could do with a power cycle... JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=TM-kJsvzTX9cq_xmR5ITHclBCfO4FDvZ3ZxyugfJCfQ&s=Ass164qVEhb9fC4_VCmzfZeYd_BLOv9cZsfkrzqi8pM&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C7c1ced16f6d44055c63408d5e4fa7d2e%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636666686866066749&sdata=Viltitj3L9aScuuVKCLSp9FKkj7xdzWxsvvPVDSUqHw%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Jul 10 12:59:18 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 10 Jul 2018 11:59:18 +0000 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Message-ID: File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? mmadddisk fs1 -F new.nsd The following disks of proserv will be formatted on node srv-gpfs06: stor1v5tb85: size 5242880 MB Extending Allocation Map Disk stor1v5tb85 cannot be added to storage pool Plevel1. Allocation map cannot accommodate disks larger than 4194555 MB. Checking Allocation Map for storage pool Plevel1 mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Tue Jul 10 13:29:59 2018 From: spectrumscale at kiranghag.com (KG) Date: Tue, 10 Jul 2018 17:59:59 +0530 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Message-ID: Addendum to the question... How is this calculated? I figured out it is based on NSD sizes that are initially used but not exactly how. ?KG? On Tue, Jul 10, 2018 at 5:29 PM, Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > File system was originally created with 1TB NSDs (4) and I want to move it > to one 5TB NSD. Any way around this error? > > > > mmadddisk fs1 -F new.nsd > > > > The following disks of proserv will be formatted on node srv-gpfs06: > > stor1v5tb85: size 5242880 MB > > Extending Allocation Map > > Disk stor1v5tb85 cannot be added to storage pool Plevel1. > > *Allocation map cannot accommodate disks larger than 4194555 MB.* > > Checking Allocation Map for storage pool Plevel1 > > mmadddisk: tsadddisk failed. > > Verifying file system configuration information ... > > mmadddisk: Propagating the cluster configuration data to all > > affected nodes. This is an asynchronous process. > > mmadddisk: Command failed. Examine previous error messages to determine > cause. > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Tue Jul 10 13:42:55 2018 From: david_johnson at brown.edu (David D Johnson) Date: Tue, 10 Jul 2018 08:42:55 -0400 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: References: Message-ID: Whenever we start with adding disks of new sizes/speeds/controllers/machine rooms compared to existing NSD's in the filesystem, we generally add them to a new storage pool. Add policy rules to make use of the new pools as desired, migrate stale files to slow disk, active files to faster/newer disk, etc. > On Jul 10, 2018, at 8:29 AM, KG wrote: > > Addendum to the question... > > How is this calculated? I figured out it is based on NSD sizes that are initially used but not exactly how. > > > ?KG? > > On Tue, Jul 10, 2018 at 5:29 PM, Oesterlin, Robert > wrote: > File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? > > > > mmadddisk fs1 -F new.nsd > > > > The following disks of proserv will be formatted on node srv-gpfs06: > > stor1v5tb85: size 5242880 MB > > Extending Allocation Map > > Disk stor1v5tb85 cannot be added to storage pool Plevel1. > > Allocation map cannot accommodate disks larger than 4194555 MB. > > Checking Allocation Map for storage pool Plevel1 > > mmadddisk: tsadddisk failed. > > Verifying file system configuration information ... > > mmadddisk: Propagating the cluster configuration data to all > > affected nodes. This is an asynchronous process. > > mmadddisk: Command failed. Examine previous error messages to determine cause. > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Jul 10 14:00:48 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 10 Jul 2018 14:00:48 +0100 Subject: [gpfsug-discuss] preventing HSM tape recall storms In-Reply-To: References: , , Message-ID: <1531227648.26036.139.camel@strath.ac.uk> On Mon, 2018-07-09 at 14:57 -0400, Frederick Stock wrote: > Another option is to request Apple to support the OFFLINE flag in the > SMB protocol. ?The more Mac customers making such a request (I have > asked others to do likewise) might convince Apple to add this > checking to their SMB client. > And we have a winner. The only workable solution is to get Apple to Finder to support the OFFLINE flag. However good luck getting Apple to actually do anything. An alternative approach might be to somehow detect the client connecting is running MacOS and prohibit recalls for them. However I am not sure the Samba team would be keen on accepting such patches unless it could be done in say VFS module. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From makaplan at us.ibm.com Tue Jul 10 14:08:45 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 10 Jul 2018 09:08:45 -0400 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: <27572009-ACD1-4317-A335-301D42E99BDE@bham.ac.uk> References: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> <27572009-ACD1-4317-A335-301D42E99BDE@bham.ac.uk> Message-ID: As long as we're giving hints... Seems tsdbfs has several subcommands that might be helpful. I like "inode" But there's also "listda" Subcommand "desc" will show you the structure of the file system under "disks:" you will see which disk numbers are which NSDs. Have fun, but DO NOT use the any of the *patch* subcommands! From: Simon Thompson To: gpfsug main discussion list Date: 07/09/2018 05:21 PM Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org I was going to say something like that ? e.g. blockaddr 563148261 Inode 563148261 snap 0 offset 0 N=2 1:45255923200 13:59403784320 1: and 13: in the output are the NSD disk devices for inode 563148261 Simon From: on behalf of "makaplan at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Monday, 9 July 2018 at 22:04 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? (psss... ) tsdbfs Not responsible for anything bad that happens...! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Jul 10 14:12:02 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 10 Jul 2018 14:12:02 +0100 Subject: [gpfsug-discuss] High I/O wait times In-Reply-To: References: <8B0CB2B9-64A5-4CE7-99F6-6DDA4EF1ACB5@vanderbilt.edu> <397DC5DA-727A-4517-82EB-46A1C08591B4@vanderbilt.edu> <733478365.61492.1530882158667@mail.yahoo.com> <1BBB7384-9575-440C-A5E8-3C2E2E56E96B@vanderbilt.edu> <288fec35-d6c8-b76f-d9de-5dc375744ec6@strath.ac.uk> <20180708174441.EE5BB17B422@gpfsug.org> Message-ID: <1531228322.26036.143.camel@strath.ac.uk> On Mon, 2018-07-09 at 21:44 +0000, Buterbaugh, Kevin L wrote: [SNIP] > Interestingly enough, one user showed up waaaayyyyyy more often than > anybody else. ?And many times she was on a node with only one other > user who we know doesn?t access the GPFS filesystem and other times > she was the only user on the node. ? > I have seen on our old HPC system which had been running fine for three years a particular user with a particular piece of software with presumably a particular access pattern trigger a firmware bug in a SAS drive (local disk to the node) that caused it to go offline (dead to the world and power/presence LED off) and only a power cycle of the node would bring it back. At first we through the drives where failing, because what the hell, but in the end a firmware update to the drives and they where fine. The moral of the story is don't rule out wacky access patterns from a single user causing problems. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From UWEFALKE at de.ibm.com Tue Jul 10 15:28:57 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 10 Jul 2018 16:28:57 +0200 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: References: Message-ID: Hi Bob, you sure the first added NSD was 1 TB? As often as i created a FS, the max NSD size was way larger than the one I added initially , not just the fourfold. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 10/07/2018 13:59 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Sent by: gpfsug-discuss-bounces at spectrumscale.org File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? mmadddisk fs1 -F new.nsd The following disks of proserv will be formatted on node srv-gpfs06: stor1v5tb85: size 5242880 MB Extending Allocation Map Disk stor1v5tb85 cannot be added to storage pool Plevel1. Allocation map cannot accommodate disks larger than 4194555 MB. Checking Allocation Map for storage pool Plevel1 mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From p.childs at qmul.ac.uk Tue Jul 10 15:50:54 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 10 Jul 2018 14:50:54 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes Message-ID: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London From bpappas at dstonline.com Tue Jul 10 16:08:03 2018 From: bpappas at dstonline.com (Bill Pappas) Date: Tue, 10 Jul 2018 15:08:03 +0000 Subject: [gpfsug-discuss] preventing HSM tape recall storms (Bill Pappas) In-Reply-To: References: Message-ID: Years back I did run a trial (to buy) software solution on OSX to address this issue. It worked! It was not cheap and they probably no longer support it anyway. It might have been from a company called Group Logic. I would suggest not exposing HSM enabled file systems (in particular ones using tape on the back end) to your general CIFS (or even) GPFS/NFS clients. It produced years (2011-2015 of frustration with recall storms that made everyone mad. If someone else had success, I think we'd all like to know how they did it....but we gave up on that. In the end I would suggest setting up an explicit archive location using/HSM tape (or low cost, high densisty disk) that is not pointing to your traditional GPFS/CIFS/NFS clients that users must deliberately access (think portal) to check in/out cold data that they can stage to their primary workspace. It is possible you considered this idea or some variation of it anyway and rejected it for good reason (e.g. more pain for the users to stage data over from cold storage to primary workspacec). Bill Pappas 901-619-0585 bpappas at dstonline.com [1466780990050_DSTlogo.png] ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org Sent: Tuesday, July 10, 2018 9:50 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 78, Issue 32 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: preventing HSM tape recall storms (Jonathan Buzzard) 2. Re: What NSDs does a file have blocks on? (Marc A Kaplan) 3. Re: High I/O wait times (Jonathan Buzzard) 4. Re: Allocation map limits - any way around this? (Uwe Falke) 5. Same file opened by many nodes / processes (Peter Childs) ---------------------------------------------------------------------- Message: 1 Date: Tue, 10 Jul 2018 14:00:48 +0100 From: Jonathan Buzzard To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms Message-ID: <1531227648.26036.139.camel at strath.ac.uk> Content-Type: text/plain; charset="UTF-8" On Mon, 2018-07-09 at 14:57 -0400, Frederick Stock wrote: > Another option is to request Apple to support the OFFLINE flag in the > SMB protocol. ?The more Mac customers making such a request (I have > asked others to do likewise) might convince Apple to add this > checking to their SMB client. > And we have a winner. The only workable solution is to get Apple to Finder to support the OFFLINE flag. However good luck getting Apple to actually do anything. An alternative approach might be to somehow detect the client connecting is running MacOS and prohibit recalls for them. However I am not sure the Samba team would be keen on accepting such patches unless it could be done in say VFS module. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ Message: 2 Date: Tue, 10 Jul 2018 09:08:45 -0400 From: "Marc A Kaplan" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Message-ID: Content-Type: text/plain; charset="utf-8" As long as we're giving hints... Seems tsdbfs has several subcommands that might be helpful. I like "inode" But there's also "listda" Subcommand "desc" will show you the structure of the file system under "disks:" you will see which disk numbers are which NSDs. Have fun, but DO NOT use the any of the *patch* subcommands! From: Simon Thompson To: gpfsug main discussion list Date: 07/09/2018 05:21 PM Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org I was going to say something like that ? e.g. blockaddr 563148261 Inode 563148261 snap 0 offset 0 N=2 1:45255923200 13:59403784320 1: and 13: in the output are the NSD disk devices for inode 563148261 Simon From: on behalf of "makaplan at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Monday, 9 July 2018 at 22:04 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? (psss... ) tsdbfs Not responsible for anything bad that happens...! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Tue, 10 Jul 2018 14:12:02 +0100 From: Jonathan Buzzard To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] High I/O wait times Message-ID: <1531228322.26036.143.camel at strath.ac.uk> Content-Type: text/plain; charset="UTF-8" On Mon, 2018-07-09 at 21:44 +0000, Buterbaugh, Kevin L wrote: [SNIP] > Interestingly enough, one user showed up waaaayyyyyy more often than > anybody else. ?And many times she was on a node with only one other > user who we know doesn?t access the GPFS filesystem and other times > she was the only user on the node. ? > I have seen on our old HPC system which had been running fine for three years a particular user with a particular piece of software with presumably a particular access pattern trigger a firmware bug in a SAS drive (local disk to the node) that caused it to go offline (dead to the world and power/presence LED off) and only a power cycle of the node would bring it back. At first we through the drives where failing, because what the hell, but in the end a firmware update to the drives and they where fine. The moral of the story is don't rule out wacky access patterns from a single user causing problems. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ------------------------------ Message: 4 Date: Tue, 10 Jul 2018 16:28:57 +0200 From: "Uwe Falke" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Allocation map limits - any way around this? Message-ID: Content-Type: text/plain; charset="ISO-8859-1" Hi Bob, you sure the first added NSD was 1 TB? As often as i created a FS, the max NSD size was way larger than the one I added initially , not just the fourfold. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 10/07/2018 13:59 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Sent by: gpfsug-discuss-bounces at spectrumscale.org File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? mmadddisk fs1 -F new.nsd The following disks of proserv will be formatted on node srv-gpfs06: stor1v5tb85: size 5242880 MB Extending Allocation Map Disk stor1v5tb85 cannot be added to storage pool Plevel1. Allocation map cannot accommodate disks larger than 4194555 MB. Checking Allocation Map for storage pool Plevel1 mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 5 Date: Tue, 10 Jul 2018 14:50:54 +0000 From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Same file opened by many nodes / processes Message-ID: <4e038c492713f418242be208532e112f8ea50a9f.camel at qmul.ac.uk> Content-Type: text/plain; charset="utf-8" We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 78, Issue 32 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-1466780990.png Type: image/png Size: 6282 bytes Desc: Outlook-1466780990.png URL: From salut4tions at gmail.com Tue Jul 10 16:54:36 2018 From: salut4tions at gmail.com (Jordan Robertson) Date: Tue, 10 Jul 2018 11:54:36 -0400 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: References: Message-ID: To second David's comments: I don't believe changing the max NSD size for a given storage pool is possible (it may be tied to the per-pool allocation mapping?), so if you want to add more dataOnly NSD's to a filesystem and get that error you may need to create a new pool. The tricky bit is that I think this only works with dataOnly NSD's, as dataAndMetadata and metadataOnly NSD's only get added to the system pool which is locked in like any other. -Jordan On Tue, Jul 10, 2018 at 10:28 AM, Uwe Falke wrote: > Hi Bob, > you sure the first added NSD was 1 TB? As often as i created a FS, the max > NSD size was way larger than the one I added initially , not just the > fourfold. > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Thomas Wolter, Sven Schoo? > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 10/07/2018 13:59 > Subject: [gpfsug-discuss] Allocation map limits - any way around > this? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > File system was originally created with 1TB NSDs (4) and I want to move it > to one 5TB NSD. Any way around this error? > > mmadddisk fs1 -F new.nsd > > The following disks of proserv will be formatted on node srv-gpfs06: > stor1v5tb85: size 5242880 MB > Extending Allocation Map > Disk stor1v5tb85 cannot be added to storage pool Plevel1. > Allocation map cannot accommodate disks larger than 4194555 MB. > Checking Allocation Map for storage pool Plevel1 > mmadddisk: tsadddisk failed. > Verifying file system configuration information ... > mmadddisk: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > mmadddisk: Command failed. Examine previous error messages to determine > cause. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Tue Jul 10 16:59:57 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 10 Jul 2018 17:59:57 +0200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: Hi, Peter, in theory, the first node opening a file should remain metanode until it closes the file, regardless how many other nodes open it in between (if all the nodes are within the same cluster). MFT is controlling the caching inodes and - AFAIK - also of indirect blocks. A 200 GiB file will most likely have indirect blocks, but just a few up to some tens, depending on the block size in the file system. The default MFT number is much larger. However, if you say the metanode is changing, that might cause some delays, as all token information has to be passed on to the next metanode (not sure how efficient that election is running). Having said that it could help if you use a dedicated node having the file open from start and all the time - this should prevent new metanodes being elected. If you do not get told a solution, you might want to run a trace of the mmbackup scan (maybe once with jobs accessing the file, once without). Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 10/07/2018 16:51 Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Tue Jul 10 17:15:14 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 10 Jul 2018 12:15:14 -0400 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: I would start by making sure that the application(s)... open the file O_RDONLY and then you may want to fiddle with the GPFS atime settings: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_atime.htm At first I thought "uge" was a typo, but I guess you are referring to: https://supcom.hgc.jp/english/utili_info/manual/uge.html Still not begin familiar, it would be "interesting" to know from a file operations point of view, what's going on in terms of opens, reads, closes : per second. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 10 17:17:58 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 10 Jul 2018 12:17:58 -0400 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: References: Message-ID: The only additional piece of information I would add is that you can see what the maximum NSD size is defined for a pool by looking at the output of mmdf. Fred Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Jordan Robertson To: gpfsug main discussion list Date: 07/10/2018 11:54 AM Subject: Re: [gpfsug-discuss] Allocation map limits - any way around this? Sent by: gpfsug-discuss-bounces at spectrumscale.org To second David's comments: I don't believe changing the max NSD size for a given storage pool is possible (it may be tied to the per-pool allocation mapping?), so if you want to add more dataOnly NSD's to a filesystem and get that error you may need to create a new pool. The tricky bit is that I think this only works with dataOnly NSD's, as dataAndMetadata and metadataOnly NSD's only get added to the system pool which is locked in like any other. -Jordan On Tue, Jul 10, 2018 at 10:28 AM, Uwe Falke wrote: Hi Bob, you sure the first added NSD was 1 TB? As often as i created a FS, the max NSD size was way larger than the one I added initially , not just the fourfold. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 10/07/2018 13:59 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Sent by: gpfsug-discuss-bounces at spectrumscale.org File system was originally created with 1TB NSDs (4) and I want to move it to one 5TB NSD. Any way around this error? mmadddisk fs1 -F new.nsd The following disks of proserv will be formatted on node srv-gpfs06: stor1v5tb85: size 5242880 MB Extending Allocation Map Disk stor1v5tb85 cannot be added to storage pool Plevel1. Allocation map cannot accommodate disks larger than 4194555 MB. Checking Allocation Map for storage pool Plevel1 mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Jul 10 17:29:42 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 10 Jul 2018 16:29:42 +0000 Subject: [gpfsug-discuss] Allocation map limits - any way around this? Message-ID: <787C309B-BE1E-47E4-B604-5E43262AFB26@nuance.com> Right - but it doesn?t give me the answer on how to best get around it. :-) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of IBM Spectrum Scale Reply-To: gpfsug main discussion list Date: Tuesday, July 10, 2018 at 11:18 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Allocation map limits - any way around this? The only additional piece of information I would add is that you can see what the maximum NSD size is defined for a pool by looking at the output of mmdf. Fred -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Tue Jul 10 17:59:17 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Tue, 10 Jul 2018 12:59:17 -0400 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: <787C309B-BE1E-47E4-B604-5E43262AFB26@nuance.com> References: <787C309B-BE1E-47E4-B604-5E43262AFB26@nuance.com> Message-ID: <72EAA3FB-5BAE-4C42-BC94-D9E98B4C11E7@brown.edu> I would as I suggested add the new NSD into a new pool in the same filesystem. Then I would migrate all the files off the old pool onto the new one. At this point you can deldisk the old ones or decide what else you?d want to do with them. -- ddj Dave Johnson > On Jul 10, 2018, at 12:29 PM, Oesterlin, Robert wrote: > > Right - but it doesn?t give me the answer on how to best get around it. :-) > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > From: on behalf of IBM Spectrum Scale > Reply-To: gpfsug main discussion list > Date: Tuesday, July 10, 2018 at 11:18 AM > To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Allocation map limits - any way around this? > > The only additional piece of information I would add is that you can see what the maximum NSD size is defined for a pool by looking at the output of mmdf. > > Fred > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scrusan at ddn.com Tue Jul 10 18:09:48 2018 From: scrusan at ddn.com (Steve Crusan) Date: Tue, 10 Jul 2018 17:09:48 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: <4E48904C-5B98-485B-B577-85532C7593A8@ddn.com> I?ve used ?preferDesignatedMnode=1? in the past, but that was for a specific usecase, and that would have to come from the direction of support. I guess if you wanted to test your metanode theory, you could open that file (and keep it open) on node from a different remote cluster, or one of your local NSD servers and see what kind of results you get out of it. ---- Steve Crusan scrusan at ddn.com (719) 695-3190 From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Tuesday, July 10, 2018 at 11:16 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes I would start by making sure that the application(s)... open the file O_RDONLY and then you may want to fiddle with the GPFS atime settings: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_atime.htm At first I thought "uge" was a typo, but I guess you are referring to: https://supcom.hgc.jp/english/utili_info/manual/uge.html Still not begin familiar, it would be "interesting" to know from a file operations point of view, what's going on in terms of opens, reads, closes : per second. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 10 18:19:47 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 10 Jul 2018 13:19:47 -0400 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From renata at slac.stanford.edu Tue Jul 10 19:35:28 2018 From: renata at slac.stanford.edu (Renata Maria Dart) Date: Tue, 10 Jul 2018 11:35:28 -0700 (PDT) Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues In-Reply-To: References: Message-ID: Hi, many thanks to all of the suggestions for how to deal with this issue. Ftr, I tried this mmchnode --noquorum -N --force on the node that was reinstalled which reinstated some of the communications between the cluster nodes, but then when I restarted the cluster, communications begain to fail again, complaining about not enough CCR nodes for quorum. I ended up reinstalling the cluster since at this point the nodes couldn't mount the remote data and I thought it would be faster. Thanks again for all of the responses, Renata Dart SLAC National Accelerator Lab On Wed, 27 Jun 2018, IBM Spectrum Scale wrote: > >Hi Renata, > >You may want to reduce the set of quorum nodes. If your version supports >the --force option, you can run > >mmchnode --noquorum -N --force > >It is a good idea to configure tiebreaker disks in a cluster that has only >2 quorum nodes. > >Regards, The Spectrum Scale (GPFS) team > >------------------------------------------------------------------------------------------------------------------ > >If you feel that your question can benefit other users of Spectrum Scale >(GPFS), then please post it to the public IBM developerWroks Forum at >https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > >If your query concerns a potential software error in Spectrum Scale (GPFS) >and you have an IBM software maintenance contract please contact >1-800-237-5511 in the United States or your local IBM Service Center in >other countries. > >The forum is informally monitored as time permits and should not be used >for priority messages to the Spectrum Scale (GPFS) team. > > > >From: Renata Maria Dart >To: gpfsug-discuss at spectrumscale.org >Date: 06/27/2018 02:21 PM >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues >Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving >data, > >root at ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine >cause. > > >On the reinstalled node, this fails in the same way: > >[root at ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine >cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > From bbanister at jumptrading.com Tue Jul 10 21:50:23 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 10 Jul 2018 20:50:23 +0000 Subject: [gpfsug-discuss] Allocation map limits - any way around this? In-Reply-To: <72EAA3FB-5BAE-4C42-BC94-D9E98B4C11E7@brown.edu> References: <787C309B-BE1E-47E4-B604-5E43262AFB26@nuance.com> <72EAA3FB-5BAE-4C42-BC94-D9E98B4C11E7@brown.edu> Message-ID: +1 From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of david_johnson at brown.edu Sent: Tuesday, July 10, 2018 11:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Allocation map limits - any way around this? Note: External Email ________________________________ I would as I suggested add the new NSD into a new pool in the same filesystem. Then I would migrate all the files off the old pool onto the new one. At this point you can deldisk the old ones or decide what else you?d want to do with them. -- ddj Dave Johnson On Jul 10, 2018, at 12:29 PM, Oesterlin, Robert > wrote: Right - but it doesn?t give me the answer on how to best get around it. :-) Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of IBM Spectrum Scale > Reply-To: gpfsug main discussion list > Date: Tuesday, July 10, 2018 at 11:18 AM To: gpfsug main discussion list > Subject: [EXTERNAL] Re: [gpfsug-discuss] Allocation map limits - any way around this? The only additional piece of information I would add is that you can see what the maximum NSD size is defined for a pool by looking at the output of mmdf. Fred _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Tue Jul 10 22:06:27 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 10 Jul 2018 21:06:27 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, Message-ID: The reason I think the metanode is moving around is I'd done a limited amount of trying to track it down using "mmfsadm saferdump file" and it moved before I'd tracked down the correct metanode. But I might have been chasing ghosts, so it may be operating normally and nothing to worry about. The user reading the file only has read access to it from the file permissions, Mmbackup has only slowed down while this job has been running. As I say the scan for what to backup usally takes 40-60 minutes, but is currently taking 3-4 hours with these jobs running. I've seen it take 3 days when our storage went bad (slow and failing disks) but that is usally a sign of a bad disk and pulling the disk and rebuilding the RAID "fixed" that straight away. I cant see anything like that currently however. It might be that its network congestion were suffering from and nothing to do with token management but as the mmpmon bytes read data is running very high with this job and the load is spread over 50+ nodes it's difficult to see one culprit. It's a mixed speed ethernet network mainly 10GB connected although the nodes in question are legacy with only 1GB connections (and 40GB to the back of the storage. We're currently running 4.2.3-8 Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- IBM Spectrum Scale wrote ---- What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Tue Jul 10 22:12:16 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 10 Jul 2018 21:12:16 +0000 Subject: [gpfsug-discuss] What NSDs does a file have blocks on? In-Reply-To: References: <4C811F21-849F-48E1-83DF-ADE3BBBBE33B@nuance.com> <27572009-ACD1-4317-A335-301D42E99BDE@bham.ac.uk> Message-ID: <5565130575454bf7a80802ecd55faec3@jumptrading.com> I know we are trying to be helpful, but suggesting that admins mess with undocumented, dangerous commands isn?t a good idea. If directed from an IBM support person with explicit instructions, then good enough, IFF it?s really required and worth the risk! I think the Kum?s suggestions are definitely a right way to handle this. In general, avoid running ts* commands unless directed by somebody that knows exactly what they are doing and understands your issue in great detail!! Just a word to the wise.. 2 cents? etc, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Tuesday, July 10, 2018 8:09 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Note: External Email ________________________________ As long as we're giving hints... Seems tsdbfs has several subcommands that might be helpful. I like "inode" But there's also "listda" Subcommand "desc" will show you the structure of the file system under "disks:" you will see which disk numbers are which NSDs. Have fun, but DO NOT use the any of the *patch* subcommands! From: Simon Thompson > To: gpfsug main discussion list > Date: 07/09/2018 05:21 PM Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I was going to say something like that ? e.g. blockaddr 563148261 Inode 563148261 snap 0 offset 0 N=2 1:45255923200 13:59403784320 1: and 13: in the output are the NSD disk devices for inode 563148261 Simon From: > on behalf of "makaplan at us.ibm.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 9 July 2018 at 22:04 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on? (psss... ) tsdbfs Not responsible for anything bad that happens...! _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Tue Jul 10 22:23:34 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 10 Jul 2018 21:23:34 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, , Message-ID: Oh the cluster is 296 nodes currently with a set size of 300 (mmcrfs -n 300) We're currently looking to upgrade the 1G connected nodes to 10G within the next few months. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Peter Childs wrote ---- The reason I think the metanode is moving around is I'd done a limited amount of trying to track it down using "mmfsadm saferdump file" and it moved before I'd tracked down the correct metanode. But I might have been chasing ghosts, so it may be operating normally and nothing to worry about. The user reading the file only has read access to it from the file permissions, Mmbackup has only slowed down while this job has been running. As I say the scan for what to backup usally takes 40-60 minutes, but is currently taking 3-4 hours with these jobs running. I've seen it take 3 days when our storage went bad (slow and failing disks) but that is usally a sign of a bad disk and pulling the disk and rebuilding the RAID "fixed" that straight away. I cant see anything like that currently however. It might be that its network congestion were suffering from and nothing to do with token management but as the mmpmon bytes read data is running very high with this job and the load is spread over 50+ nodes it's difficult to see one culprit. It's a mixed speed ethernet network mainly 10GB connected although the nodes in question are legacy with only 1GB connections (and 40GB to the back of the storage. We're currently running 4.2.3-8 Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- IBM Spectrum Scale wrote ---- What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Jul 10 23:15:01 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 10 Jul 2018 18:15:01 -0400 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, , Message-ID: Regarding the permissions on the file I assume you are not using ACLs, correct? If you are then you would need to check what the ACL allows. Is your metadata on separate NSDs? Having metadata on separate NSDs, and preferably fast NSDs, would certainly help your mmbackup scanning. Have you looked at the information from netstat or similar network tools to see how your network is performing? Faster networks generally require a bit of OS tuning and some GPFS tuning to optimize their performance. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: gpfsug main discussion list Date: 07/10/2018 05:23 PM Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org Oh the cluster is 296 nodes currently with a set size of 300 (mmcrfs -n 300) We're currently looking to upgrade the 1G connected nodes to 10G within the next few months. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Peter Childs wrote ---- The reason I think the metanode is moving around is I'd done a limited amount of trying to track it down using "mmfsadm saferdump file" and it moved before I'd tracked down the correct metanode. But I might have been chasing ghosts, so it may be operating normally and nothing to worry about. The user reading the file only has read access to it from the file permissions, Mmbackup has only slowed down while this job has been running. As I say the scan for what to backup usally takes 40-60 minutes, but is currently taking 3-4 hours with these jobs running. I've seen it take 3 days when our storage went bad (slow and failing disks) but that is usally a sign of a bad disk and pulling the disk and rebuilding the RAID "fixed" that straight away. I cant see anything like that currently however. It might be that its network congestion were suffering from and nothing to do with token management but as the mmpmon bytes read data is running very high with this job and the load is spread over 50+ nodes it's difficult to see one culprit. It's a mixed speed ethernet network mainly 10GB connected although the nodes in question are legacy with only 1GB connections (and 40GB to the back of the storage. We're currently running 4.2.3-8 Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- IBM Spectrum Scale wrote ---- What is in the dump that indicates the metanode is moving around? Could you please provide an example of what you are seeing? You noted that the access is all read only, is the file opened for read only or for read and write? What makes you state that this particular file is interfering with the scan done by mmbackup? Reading a file, no matter how large should significantly impact a policy scan. What version of Spectrum Scale are you running and how large is your cluster? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 10:51 AM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Wed Jul 11 13:30:16 2018 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Wed, 11 Jul 2018 14:30:16 +0200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From heiner.billich at psi.ch Wed Jul 11 14:40:46 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Wed, 11 Jul 2018 13:40:46 +0000 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown Message-ID: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> Hello, I have two nodes which hang on ?mmshutdown?, in detail the command ?/sbin/rmmod mmfs26? hangs. I get kernel messages which I append below. I wonder if this looks familiar to somebody? Is it a known bug? I can avoid the issue if I reduce pagepool from 128G to 64G. Running ?systemctl stop gpfs? shows the same issue. It forcefully terminates after a while, but ?rmmod? stays stuck. Two functions cxiReleaseAndForgetPages and put_page seem to be involved, the first part of gpfs, the second a kernel call. The servers have 256G memory and 72 (virtual) cores each. I run 5.0.1-1 on RHEL7.4 with kernel 3.10.0-693.17.1.el7.x86_64. I can try to switch back to 5.0.0 Thank you & kind regards, Heiner Jul 11 14:12:04 node-1.x.y mmremote[1641]: Unloading module mmfs26 Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The Spectrum Scale service process not running on this node. Normal operation cannot be done Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [I] Event raised: The Spectrum Scale service process is running Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The node is not able to form a quorum with the other available nodes. Jul 11 14:12:38 node-1.x.y sshd[2826]: Connection closed by xxx port 52814 [preauth] Jul 11 14:12:41 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [rmmod:2695] Jul 11 14:12:41 node-1.x.y kernel: Modules linked in: mmfs26(OE-) mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect Jul 11 14:12:41 node-1.x.y kernel: sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] Jul 11 14:12:41 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 Jul 11 14:12:41 node-1.x.y kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Jul 11 14:12:41 node-1.x.y kernel: task: ffff8808c4814f10 ti: ffff881619778000 task.ti: ffff881619778000 Jul 11 14:12:41 node-1.x.y kernel: RIP: 0010:[] [] put_compound_page+0xc3/0x174 Jul 11 14:12:41 node-1.x.y kernel: RSP: 0018:ffff88161977bd50 EFLAGS: 00000246 Jul 11 14:12:41 node-1.x.y kernel: RAX: 0000000000000283 RBX: 00000000fae3d201 RCX: 0000000000000284 Jul 11 14:12:41 node-1.x.y kernel: RDX: 0000000000000283 RSI: 0000000000000246 RDI: ffffea003d478000 Jul 11 14:12:41 node-1.x.y kernel: RBP: ffff88161977bd68 R08: ffff881ffae3d1e0 R09: 0000000180800059 Jul 11 14:12:41 node-1.x.y kernel: R10: 00000000fae3d201 R11: ffffea007feb8f40 R12: 00000000fae3d201 Jul 11 14:12:41 node-1.x.y kernel: R13: ffff88161977bd40 R14: 0000000000000000 R15: ffff88161977bd40 Jul 11 14:12:41 node-1.x.y kernel: FS: 00007f81a1db0740(0000) GS:ffff883ffee80000(0000) knlGS:0000000000000000 Jul 11 14:12:41 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 11 14:12:41 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: 0000000c36b2c000 CR4: 00000000001607e0 Jul 11 14:12:41 node-1.x.y kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 11 14:12:41 node-1.x.y kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 11 14:12:41 node-1.x.y kernel: Call Trace: Jul 11 14:12:41 node-1.x.y kernel: [] put_page+0x45/0x50 Jul 11 14:12:41 node-1.x.y kernel: [] cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] ? kmem_cache_free+0x1e2/0x200 Jul 11 14:12:41 node-1.x.y kernel: [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] SyS_delete_module+0x19b/0x300 Jul 11 14:12:41 node-1.x.y kernel: [] system_call_fastpath+0x16/0x1b Jul 11 14:12:41 node-1.x.y kernel: Code: d1 00 00 00 4c 89 e7 e8 3a ff ff ff e9 c4 00 00 00 4c 39 e3 74 c1 41 8b 54 24 1c 85 d2 74 b8 8d 4a 01 89 d0 f0 41 0f b1 4c 24 1c <39> c2 74 04 89 c2 eb e8 e8 f3 f0 ae ff 49 89 c5 f0 41 0f ba 2c Jul 11 14:13:23 node-1.x.y systemd[1]: gpfs.service stopping timed out. Terminating. Jul 11 14:13:27 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 21s! [rmmod:2695] Jul 11 14:13:27 node-1.x.y kernel: Modules linked in: mmfs26(OE-) mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler Jul 11 14:13:27 node-1.x.y kernel: INFO: rcu_sched detected stalls on CPUs/tasks: Jul 11 14:13:27 node-1.x.y kernel: { Jul 11 14:13:27 node-1.x.y kernel: 28 Jul 11 14:13:27 node-1.x.y kernel: } Jul 11 14:13:27 node-1.x.y kernel: (detected by 17, t=60002 jiffies, g=267734, c=267733, q=36089) Jul 11 14:13:27 node-1.x.y kernel: Task dump for CPU 28: Jul 11 14:13:27 node-1.x.y kernel: rmmod R Jul 11 14:13:27 node-1.x.y kernel: running task Jul 11 14:13:27 node-1.x.y kernel: 0 2695 2642 0x00000008 Jul 11 14:13:27 node-1.x.y kernel: Call Trace: Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? __free_slab+0xdc/0x200 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? __put_compound_page+0x22/0x22 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? put_page+0x45/0x50 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? SyS_delete_module+0x19b/0x300 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? system_call_fastpath+0x16/0x1b Jul 11 14:13:27 node-1.x.y kernel: acpi_power_meter Jul 11 14:13:27 node-1.x.y kernel: binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] Jul 11 14:13:27 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 Jul 11 14:13:27 node-1.x.y kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Jul 11 14:13:27 node-1.x.y kernel: task: ffff8808c4814f10 ti: ffff881619778000 task.ti: ffff881619778000 Jul 11 14:13:27 node-1.x.y kernel: RIP: 0010:[] [] __put_compound_page+0x22/0x22 Jul 11 14:13:27 node-1.x.y kernel: RSP: 0018:ffff88161977bd70 EFLAGS: 00000282 Jul 11 14:13:27 node-1.x.y kernel: RAX: 002fffff00008010 RBX: 0000000000000135 RCX: 00000000000001c1 Jul 11 14:13:27 node-1.x.y kernel: RDX: ffff8814adbbf000 RSI: 0000000000000246 RDI: ffffea00650e7040 Jul 11 14:13:27 node-1.x.y kernel: RBP: ffff88161977bd78 R08: ffff881ffae3df60 R09: 0000000180800052 Jul 11 14:13:27 node-1.x.y kernel: R10: 00000000fae3db01 R11: ffffea007feb8f40 R12: ffff881ffae3df60 Jul 11 14:13:27 node-1.x.y kernel: R13: 0000000180800052 R14: 00000000fae3db01 R15: ffffea007feb8f40 Jul 11 14:13:27 node-1.x.y kernel: FS: 00007f81a1db0740(0000) GS:ffff883ffee80000(0000) knlGS:0000000000000000 Jul 11 14:13:27 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 11 14:13:27 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: 0000000c36b2c000 CR4: 00000000001607e0 Jul 11 14:13:27 node-1.x.y kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 11 14:13:27 node-1.x.y kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 11 14:13:27 node-1.x.y kernel: Call Trace: Jul 11 14:13:27 node-1.x.y kernel: [] ? put_page+0x45/0x50 Jul 11 14:13:27 node-1.x.y kernel: [] cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] SyS_delete_module+0x19b/0x300 Jul 11 14:13:27 node-1.x.y kernel: [] system_call_fastpath+0x16/0x1b Jul 11 14:13:27 node-1.x.y kernel: Code: c0 0f 95 c0 0f b6 c0 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 8b 07 48 89 fb a8 20 74 05 e8 0c f8 ae ff 48 89 df ff 53 60 5b 5d c3 <0f> 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 07 48 89 fb f6 -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Jul 11 14:47:06 2018 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 11 Jul 2018 06:47:06 -0700 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> Message-ID: Hi, what does numactl -H report ? also check if this is set to yes : root at fab3a:~# mmlsconfig numaMemoryInterleave numaMemoryInterleave yes Sven On Wed, Jul 11, 2018 at 6:40 AM Billich Heinrich Rainer (PSI) < heiner.billich at psi.ch> wrote: > Hello, > > > > I have two nodes which hang on ?mmshutdown?, in detail the command > ?/sbin/rmmod mmfs26? hangs. I get kernel messages which I append below. I > wonder if this looks familiar to somebody? Is it a known bug? I can avoid > the issue if I reduce pagepool from 128G to 64G. > > > > Running ?systemctl stop gpfs? shows the same issue. It forcefully > terminates after a while, but ?rmmod? stays stuck. > > > > Two functions cxiReleaseAndForgetPages and put_page seem to be involved, > the first part of gpfs, the second a kernel call. > > > > The servers have 256G memory and 72 (virtual) cores each. > > I run 5.0.1-1 on RHEL7.4 with kernel 3.10.0-693.17.1.el7.x86_64. > > > > I can try to switch back to 5.0.0 > > > > Thank you & kind regards, > > > > Heiner > > > > > > > > Jul 11 14:12:04 node-1.x.y mmremote[1641]: Unloading module mmfs26 > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The Spectrum > Scale service process not running on this node. Normal operation cannot be > done > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [I] Event raised: The Spectrum > Scale service process is running > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The node is > not able to form a quorum with the other available nodes. > > Jul 11 14:12:38 node-1.x.y sshd[2826]: Connection closed by xxx port 52814 > [preauth] > > > > Jul 11 14:12:41 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 > stuck for 23s! [rmmod:2695] > > > > Jul 11 14:12:41 node-1.x.y kernel: Modules linked in: mmfs26(OE-) > mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) > rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) > mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) > mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp > coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt > pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler > acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc > ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 > i2c_algo_bit drm_kms_helper syscopyarea sysfillrect > > Jul 11 14:12:41 node-1.x.y kernel: sysimgblt fb_sys_fops ttm ixgbe > mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp > crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca > dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] > > Jul 11 14:12:41 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: > G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > Jul 11 14:12:41 node-1.x.y kernel: Hardware name: HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Jul 11 14:12:41 node-1.x.y kernel: task: ffff8808c4814f10 ti: > ffff881619778000 task.ti: ffff881619778000 > > Jul 11 14:12:41 node-1.x.y kernel: RIP: 0010:[] > [] put_compound_page+0xc3/0x174 > > Jul 11 14:12:41 node-1.x.y kernel: RSP: 0018:ffff88161977bd50 EFLAGS: > 00000246 > > Jul 11 14:12:41 node-1.x.y kernel: RAX: 0000000000000283 RBX: > 00000000fae3d201 RCX: 0000000000000284 > > Jul 11 14:12:41 node-1.x.y kernel: RDX: 0000000000000283 RSI: > 0000000000000246 RDI: ffffea003d478000 > > Jul 11 14:12:41 node-1.x.y kernel: RBP: ffff88161977bd68 R08: > ffff881ffae3d1e0 R09: 0000000180800059 > > Jul 11 14:12:41 node-1.x.y kernel: R10: 00000000fae3d201 R11: > ffffea007feb8f40 R12: 00000000fae3d201 > > Jul 11 14:12:41 node-1.x.y kernel: R13: ffff88161977bd40 R14: > 0000000000000000 R15: ffff88161977bd40 > > Jul 11 14:12:41 node-1.x.y kernel: FS: 00007f81a1db0740(0000) > GS:ffff883ffee80000(0000) knlGS:0000000000000000 > > Jul 11 14:12:41 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Jul 11 14:12:41 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: > 0000000c36b2c000 CR4: 00000000001607e0 > > Jul 11 14:12:41 node-1.x.y kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Jul 11 14:12:41 node-1.x.y kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > > > > Jul 11 14:12:41 node-1.x.y kernel: Call Trace: > > Jul 11 14:12:41 node-1.x.y kernel: [] put_page+0x45/0x50 > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] ? > kmem_cache_free+0x1e2/0x200 > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] mmfs+0xc85/0xca0 > [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] > SyS_delete_module+0x19b/0x300 > > Jul 11 14:12:41 node-1.x.y kernel: [] > system_call_fastpath+0x16/0x1b > > Jul 11 14:12:41 node-1.x.y kernel: Code: d1 00 00 00 4c 89 e7 e8 3a ff ff > ff e9 c4 00 00 00 4c 39 e3 74 c1 41 8b 54 24 1c 85 d2 74 b8 8d 4a 01 89 d0 > f0 41 0f b1 4c 24 1c <39> c2 74 04 89 c2 eb e8 e8 f3 f0 ae ff 49 89 c5 f0 > 41 0f ba 2c > > > > Jul 11 14:13:23 node-1.x.y systemd[1]: gpfs.service stopping timed out. > Terminating. > > > > Jul 11 14:13:27 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 > stuck for 21s! [rmmod:2695] > > > > Jul 11 14:13:27 node-1.x.y kernel: Modules linked in: mmfs26(OE-) > mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) > rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) > mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) > mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp > coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt > pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler > > Jul 11 14:13:27 node-1.x.y kernel: INFO: rcu_sched detected stalls on > CPUs/tasks: > > Jul 11 14:13:27 node-1.x.y kernel: { > > Jul 11 14:13:27 node-1.x.y kernel: 28 > > Jul 11 14:13:27 node-1.x.y kernel: } > > Jul 11 14:13:27 node-1.x.y kernel: (detected by 17, t=60002 jiffies, > g=267734, c=267733, q=36089) > > Jul 11 14:13:27 node-1.x.y kernel: Task dump for CPU 28: > > Jul 11 14:13:27 node-1.x.y kernel: rmmod R > > Jul 11 14:13:27 node-1.x.y kernel: running task > > Jul 11 14:13:27 node-1.x.y kernel: 0 2695 2642 0x00000008 > > Jul 11 14:13:27 node-1.x.y kernel: Call Trace: > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > __free_slab+0xdc/0x200 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > __put_compound_page+0x22/0x22 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > put_page+0x45/0x50 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > mmfs+0xc85/0xca0 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > SyS_delete_module+0x19b/0x300 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > system_call_fastpath+0x16/0x1b > > Jul 11 14:13:27 node-1.x.y kernel: acpi_power_meter > > Jul 11 14:13:27 node-1.x.y kernel: binfmt_misc nfsd auth_rpcgss nfs_acl > lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif > crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul > mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa > pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log > dm_mod [last unloaded: tracedev] > > Jul 11 14:13:27 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: > G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > Jul 11 14:13:27 node-1.x.y kernel: Hardware name: HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Jul 11 14:13:27 node-1.x.y kernel: task: ffff8808c4814f10 ti: > ffff881619778000 task.ti: ffff881619778000 > > Jul 11 14:13:27 node-1.x.y kernel: RIP: 0010:[] > [] __put_compound_page+0x22/0x22 > > Jul 11 14:13:27 node-1.x.y kernel: RSP: 0018:ffff88161977bd70 EFLAGS: > 00000282 > > Jul 11 14:13:27 node-1.x.y kernel: RAX: 002fffff00008010 RBX: > 0000000000000135 RCX: 00000000000001c1 > > Jul 11 14:13:27 node-1.x.y kernel: RDX: ffff8814adbbf000 RSI: > 0000000000000246 RDI: ffffea00650e7040 > > Jul 11 14:13:27 node-1.x.y kernel: RBP: ffff88161977bd78 R08: > ffff881ffae3df60 R09: 0000000180800052 > > Jul 11 14:13:27 node-1.x.y kernel: R10: 00000000fae3db01 R11: > ffffea007feb8f40 R12: ffff881ffae3df60 > > Jul 11 14:13:27 node-1.x.y kernel: R13: 0000000180800052 R14: > 00000000fae3db01 R15: ffffea007feb8f40 > > Jul 11 14:13:27 node-1.x.y kernel: FS: 00007f81a1db0740(0000) > GS:ffff883ffee80000(0000) knlGS:0000000000000000 > > Jul 11 14:13:27 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Jul 11 14:13:27 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: > 0000000c36b2c000 CR4: 00000000001607e0 > > Jul 11 14:13:27 node-1.x.y kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Jul 11 14:13:27 node-1.x.y kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > > Jul 11 14:13:27 node-1.x.y kernel: Call Trace: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > put_page+0x45/0x50 > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] mmfs+0xc85/0xca0 > [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] > SyS_delete_module+0x19b/0x300 > > Jul 11 14:13:27 node-1.x.y kernel: [] > system_call_fastpath+0x16/0x1b > > Jul 11 14:13:27 node-1.x.y kernel: Code: c0 0f 95 c0 0f b6 c0 5d c3 0f 1f > 44 00 00 55 48 89 e5 53 48 8b 07 48 89 fb a8 20 74 05 e8 0c f8 ae ff 48 89 > df ff 53 60 5b 5d c3 <0f> 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 07 > 48 89 fb f6 > > > > -- > > Paul Scherrer Institut > > Science IT > > Heiner Billich > > WHGA 106 > > CH 5232 Villigen PSI > > 056 310 36 02 > > https://www.psi.ch > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 11 15:32:37 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 11 Jul 2018 14:32:37 +0000 Subject: [gpfsug-discuss] mmdiag --iohist question Message-ID: <351F676D-D785-4895-A278-3BEA717B9C87@vanderbilt.edu> Hi All, Quick question about ?mmdiag ?iohist? that is not documented in the man page ? what does it mean if the client IP address field is blank? That the NSD server itself issued the I/O? Or ??? This only happens occasionally ? and the way I discovered it was that our Python script that takes ?mmdiag ?iohist? output, looks up the client IP for any waits above the threshold, converts that to a hostname, and queries SLURM for whose jobs are on that client started occasionally throwing an exception ? and when I started looking at the ?mmdiag ?iohist? output itself I do see times when there is no client IP address listed for a I/O wait. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zacekm at img.cas.cz Thu Jul 12 07:46:22 2018 From: zacekm at img.cas.cz (Michal Zacek) Date: Thu, 12 Jul 2018 08:46:22 +0200 Subject: [gpfsug-discuss] File placement rule for new files in directory Message-ID: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> Hello, it is possible to create file placement policy for new files in one directory? I need something like this --> All new files created in directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". Thanks. Best regards, Michal -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3776 bytes Desc: Elektronicky podpis S/MIME URL: From S.J.Thompson at bham.ac.uk Thu Jul 12 09:04:11 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 12 Jul 2018 08:04:11 +0000 Subject: [gpfsug-discuss] File placement rule for new files in directory In-Reply-To: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> Message-ID: <8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> Is ABCD a fileset? If so, its easy with something like: RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name') Simon ?On 12/07/2018, 07:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of zacekm at img.cas.cz" wrote: Hello, it is possible to create file placement policy for new files in one directory? I need something like this --> All new files created in directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". Thanks. Best regards, Michal From Renar.Grunenberg at huk-coburg.de Thu Jul 12 09:17:37 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 12 Jul 2018 08:17:37 +0000 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Message-ID: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> Hallo All, we see after a reboot of two NSD-Servers some disks in different filesystems are down and we don?t see why. The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and SS 5.0.1.1. The question now, there are any log, structures in the gpfs deamon that log these situation? What was the reason why the deamon hast no access to the disks at that startup phase. Any hints are appreciated. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From smita.raut at in.ibm.com Thu Jul 12 09:39:20 2018 From: smita.raut at in.ibm.com (Smita J Raut) Date: Thu, 12 Jul 2018 14:09:20 +0530 Subject: [gpfsug-discuss] File placement rule for new files in directory In-Reply-To: <8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> <8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> Message-ID: If ABCD is not a fileset then below rule can be used- RULE 'ABCD-rule-01' SET POOL 'fastdata' WHERE PATH_NAME LIKE ' /gpfs/gpfs01/ABCD/%' Thanks, Smita From: Simon Thompson To: gpfsug main discussion list Date: 07/12/2018 01:34 PM Subject: Re: [gpfsug-discuss] File placement rule for new files in directory Sent by: gpfsug-discuss-bounces at spectrumscale.org Is ABCD a fileset? If so, its easy with something like: RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name') Simon On 12/07/2018, 07:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of zacekm at img.cas.cz" wrote: Hello, it is possible to create file placement policy for new files in one directory? I need something like this --> All new files created in directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". Thanks. Best regards, Michal _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 12 09:40:06 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 12 Jul 2018 08:40:06 +0000 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Message-ID: <34BB4D15-5F76-453B-AC8C-FF5096133296@bham.ac.uk> How are the disks attached? We have some IB/SRP storage that is sometimes a little slow to appear in multipath and have seen this in the past (we since set autoload=off and always check multipath before restarting GPFS on the node). Simon From: on behalf of "Renar.Grunenberg at huk-coburg.de" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 12 July 2018 at 09:17 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Hallo All, we see after a reboot of two NSD-Servers some disks in different filesystems are down and we don?t see why. The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and SS 5.0.1.1. The question now, there are any log, structures in the gpfs deamon that log these situation? What was the reason why the deamon hast no access to the disks at that startup phase. Any hints are appreciated. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zacekm at img.cas.cz Thu Jul 12 09:49:38 2018 From: zacekm at img.cas.cz (Michal Zacek) Date: Thu, 12 Jul 2018 10:49:38 +0200 Subject: [gpfsug-discuss] File placement rule for new files in directory In-Reply-To: References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz> <8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> Message-ID: <3cf065b3-383b-d1ff-1a33-3cc4b5845274@img.cas.cz> That's perfect, thank you both. Best regards Michal Dne 12.7.2018 v 10:39 Smita J Raut napsal(a): > If ABCD is not a fileset then below rule can be used- > > RULE 'ABCD-rule-01' SET POOL 'fastdata' WHERE PATH_NAME LIKE > '/gpfs/gpfs01/ABCD/%' > > Thanks, > Smita > > > > From: Simon Thompson > To: gpfsug main discussion list > Date: 07/12/2018 01:34 PM > Subject: Re: [gpfsug-discuss] File placement rule for new files in > directory > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Is ABCD a fileset? If so, its easy with something like: > > RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name') > > Simon > > On 12/07/2018, 07:56, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of zacekm at img.cas.cz" on behalf of zacekm at img.cas.cz> wrote: > > ? ?Hello, > > ? ?it is possible to create file placement policy for new files in one > ? ?directory? I need something like this --> All new files created in > ? ?directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". > ? ?Thanks. > > ? ?Best regards, > ? ?Michal > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3776 bytes Desc: Elektronicky podpis S/MIME URL: From Achim.Rehor at de.ibm.com Thu Jul 12 10:47:26 2018 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Thu, 12 Jul 2018 11:47:26 +0200 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot In-Reply-To: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> References: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Thu Jul 12 11:01:29 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 12 Jul 2018 10:01:29 +0000 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot In-Reply-To: References: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> Message-ID: <63cd931c1977483089ad2d9546803461@SMXRF105.msg.hukrf.de> Hallo Achim, hallo Simon, first thanks for your answers. I think Achims answers map these at best. The nsd-servers (only 2) for these disk were mistakenly restart in a same time window. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Achim Rehor Gesendet: Donnerstag, 12. Juli 2018 11:47 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Analyse steps if disk are down after reboot Hi Renar, whenever an access to a NSD happens, there is a potential that the node cannot access the disk, so if the (only) NSD server is down, there will be no chance to access the disk, and the disk will be set down. If you have twintailed disks, the 'second' (or possibly some more) NSD server will be asked, switching to networked access, and in that case only if that also fails, the disk will be set to down as well. Not sure how your setup is, but if you reboot 2 NSD servers, and some client possibly did IO to a file served by just these 2, then the 'down' state would be explainable. Rebooting of an NSD server should never set a disk to down, except, he was the only one serving that NSD. Mit freundlichen Gr??en / Kind regards Achim Rehor ________________________________ Software Technical Support Specialist AIX/ Emea HPC Support [cid:image001.gif at 01D419D7.A9373E60] IBM Certified Advanced Technical Expert - Power Systems with AIX TSCC Software Service, Dept. 7922 Global Technology Services ________________________________ Phone: +49-7034-274-7862 IBM Deutschland E-Mail: Achim.Rehor at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martin Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940 From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 12/07/2018 10:17 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hallo All, we see after a reboot of two NSD-Servers some disks in different filesystems are down and we don?t see why. The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and SS 5.0.1.1. The question now, there are any log, structures in the gpfs deamon that log these situation? What was the reason why the deamon hast no access to the disks at that startup phase. Any hints are appreciated. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 7182 bytes Desc: image001.gif URL: From scale at us.ibm.com Thu Jul 12 12:33:39 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 12 Jul 2018 07:33:39 -0400 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot In-Reply-To: <63cd931c1977483089ad2d9546803461@SMXRF105.msg.hukrf.de> References: <173350defb3b4b1683b2c82fff9b0f3b@SMXRF105.msg.hukrf.de> <63cd931c1977483089ad2d9546803461@SMXRF105.msg.hukrf.de> Message-ID: Just to follow up on the question about where to learn why a NSD is marked down you should see a message in the GPFS log, /var/adm/ras/mmfs.log.* Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" To: 'gpfsug main discussion list' Date: 07/12/2018 06:01 AM Subject: Re: [gpfsug-discuss] Analyse steps if disk are down after reboot Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Achim, hallo Simon, first thanks for your answers. I think Achims answers map these at best. The nsd-servers (only 2) for these disk were mistakenly restart in a same time window. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Achim Rehor Gesendet: Donnerstag, 12. Juli 2018 11:47 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Analyse steps if disk are down after reboot Hi Renar, whenever an access to a NSD happens, there is a potential that the node cannot access the disk, so if the (only) NSD server is down, there will be no chance to access the disk, and the disk will be set down. If you have twintailed disks, the 'second' (or possibly some more) NSD server will be asked, switching to networked access, and in that case only if that also fails, the disk will be set to down as well. Not sure how your setup is, but if you reboot 2 NSD servers, and some client possibly did IO to a file served by just these 2, then the 'down' state would be explainable. Rebooting of an NSD server should never set a disk to down, except, he was the only one serving that NSD. Mit freundlichen Gr??en / Kind regards Achim Rehor Software Technical Support Specialist AIX/ Emea HPC Support IBM Certified Advanced Technical Expert - Power Systems with AIX TSCC Software Service, Dept. 7922 Global Technology Services Phone: +49-7034-274-7862 IBM Deutschland E-Mail: Achim.Rehor at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martin Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940 From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" < gpfsug-discuss at spectrumscale.org> Date: 12/07/2018 10:17 Subject: [gpfsug-discuss] Analyse steps if disk are down after reboot Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo All, we see after a reboot of two NSD-Servers some disks in different filesystems are down and we don?t see why. The logs (messages, dmesg, kern,..) are saying nothing. We are on Rhel7.4 and SS 5.0.1.1. The question now, there are any log, structures in the gpfs deamon that log these situation? What was the reason why the deamon hast no access to the disks at that startup phase. Any hints are appreciated. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From UWEFALKE at de.ibm.com Thu Jul 12 14:16:23 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 12 Jul 2018 15:16:23 +0200 Subject: [gpfsug-discuss] File placement rule for new files in directory In-Reply-To: <3cf065b3-383b-d1ff-1a33-3cc4b5845274@img.cas.cz> References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz><8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk> <3cf065b3-383b-d1ff-1a33-3cc4b5845274@img.cas.cz> Message-ID: If that has not changed, then: PATH_NAME is not usable for placement policies. Only the FILESET_NAME attribute is accepted. One might think, that PATH_NAME is as known on creating a new file as is FILESET_NAME, but for some reason the documentation says: "When file attributes are referenced in initial placement rules, only the following attributes are valid: FILESET_NAME, GROUP_ID, NAME, and USER_ID. " Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Michal Zacek To: gpfsug-discuss at spectrumscale.org Date: 12/07/2018 10:49 Subject: Re: [gpfsug-discuss] File placement rule for new files in directory Sent by: gpfsug-discuss-bounces at spectrumscale.org That's perfect, thank you both. Best regards Michal Dne 12.7.2018 v 10:39 Smita J Raut napsal(a): If ABCD is not a fileset then below rule can be used- RULE 'ABCD-rule-01' SET POOL 'fastdata' WHERE PATH_NAME LIKE '/gpfs/gpfs01/ABCD/%' Thanks, Smita From: Simon Thompson To: gpfsug main discussion list Date: 07/12/2018 01:34 PM Subject: Re: [gpfsug-discuss] File placement rule for new files in directory Sent by: gpfsug-discuss-bounces at spectrumscale.org Is ABCD a fileset? If so, its easy with something like: RULE 'ABCD-rule-01' SET POOL 'fastdata' FOR FILESET ('ABCD-fileset-name') Simon On 12/07/2018, 07:56, "gpfsug-discuss-bounces at spectrumscale.org on behalf of zacekm at img.cas.cz" wrote: Hello, it is possible to create file placement policy for new files in one directory? I need something like this --> All new files created in directory "/gpfs/gpfs01/ABCD" will be stored in pool "fastdata". Thanks. Best regards, Michal _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss [attachment "smime.p7s" deleted by Uwe Falke/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From heiner.billich at psi.ch Thu Jul 12 14:30:43 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 12 Jul 2018 13:30:43 +0000 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> Message-ID: <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> Hello Sven, Thank you. I did enable numaMemorInterleave but the issues stays. In the meantime I switched to version 5.0.0-2 just to see if it?s version dependent ? it?s not. All gpfs filesystems are unmounted when this happens. At shutdown I often need to do a hard reset to force a reboot ? o.k., I never waited more than 5 minutes once I saw a hang, maybe it would recover after some more time. ?rmmod mmfs26? doesn?t hang all the times, maybe at every other shutdown or mmstartup/mmshutdown cycle. While rmmod hangs the system seems slow, command like ?ps -efH? or ?history? take a long time and some mm commands just block, a few times the system gets completely inaccessible. I?ll reinstall the systems and move back to 4.2.3-8 and see if this is a stable configuration to start from an to rule out any hardware/BIOS issues. I append output from numactl -H below. Cheers, Heiner Test with 5.0.0-2 [root at xbl-ces-2 ~]# numactl -H available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 node 0 size: 130942 MB node 0 free: 60295 MB node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 node 1 size: 131072 MB node 1 free: 60042 MB node distances: node 0 1 0: 10 21 1: 21 10 [root at xbl-ces-2 ~]# mmdiag --config | grep numaM ! numaMemoryInterleave yes # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-693.17.1.el7.x86_64 root=/dev/mapper/vg_root-lv_root ro crashkernel=auto rd.lvm.lv=vg_root/lv_root console=tty0 console=ttyS0,115200 nosmap Example output of ps -efH during mmshutdown when rmmod did hang (last line) This is with 5.0.0-2. As I see all gpfs processe already terminated, just root 1 0 0 14:30 ? 00:00:10 /usr/lib/systemd/systemd --switched-root --system --deserialize 21 root 1035 1 0 14:30 ? 00:00:02 /usr/lib/systemd/systemd-journald root 1055 1 0 14:30 ? 00:00:00 /usr/sbin/lvmetad -f root 1072 1 0 14:30 ? 00:00:11 /usr/lib/systemd/systemd-udevd root 1478 1 0 14:31 ? 00:00:00 /usr/sbin/sssd -i -f root 1484 1478 0 14:31 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain D.PSI.CH --uid 0 --gid 0 --debug-to-files root 1486 1478 0 14:31 ? 00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files root 1487 1478 0 14:31 ? 00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files root 1479 1 0 14:31 ? 00:00:00 /usr/sbin/rasdaemon -f -r root 1482 1 0 14:31 ? 00:00:04 /usr/sbin/irqbalance --foreground dbus 1483 1 0 14:31 ? 00:00:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation root 1496 1 0 14:31 ? 00:00:00 /usr/sbin/smartd -n -q never root 1498 1 0 14:31 ? 00:00:00 /usr/sbin/gssproxy -D nscd 1507 1 0 14:31 ? 00:00:01 /usr/sbin/nscd nrpe 1526 1 0 14:31 ? 00:00:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d root 1531 1 0 14:31 ? 00:00:00 /usr/lib/systemd/systemd-logind root 1533 1 0 14:31 ? 00:00:00 /usr/sbin/rpc.gssd root 1803 1 0 14:31 ttyS0 00:00:00 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220 root 1804 1 0 14:31 tty1 00:00:00 /sbin/agetty --noclear tty1 linux root 2405 1 0 14:32 ? 00:00:00 /sbin/dhclient -q -cf /etc/dhcp/dhclient-ib0.conf -lf /var/lib/dhclient/dhclient--ib0.l root 2461 1 0 14:32 ? 00:00:00 /usr/sbin/sshd -D root 11561 2461 0 14:35 ? 00:00:00 sshd: root at pts/0 root 11565 11561 0 14:35 pts/0 00:00:00 -bash root 16024 11565 0 14:50 pts/0 00:00:05 ps -efH root 11609 2461 0 14:35 ? 00:00:00 sshd: root at pts/1 root 11644 11609 0 14:35 pts/1 00:00:00 -bash root 2718 1 0 14:32 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 0 no root 2758 1 0 14:32 ? 00:00:00 /usr/libexec/postfix/master -w postfix 2785 2758 0 14:32 ? 00:00:00 pickup -l -t unix -u postfix 2786 2758 0 14:32 ? 00:00:00 qmgr -l -t unix -u root 3174 1 0 14:32 ? 00:00:00 /usr/sbin/crond -n ntp 3179 1 0 14:32 ? 00:00:00 /usr/sbin/ntpd -u ntp:ntp -g root 3915 1 3 14:32 ? 00:00:33 python /usr/lpp/mmfs/bin/mmsysmon.py root 13618 1 0 14:36 ? 00:00:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 8192 yes no root 15936 1 0 14:49 pts/1 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs root 15992 15936 0 14:49 pts/1 00:00:00 /sbin/rmmod mmfs26 -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Wednesday 11 July 2018 at 15:47 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown Hi, what does numactl -H report ? also check if this is set to yes : root at fab3a:~# mmlsconfig numaMemoryInterleave numaMemoryInterleave yes Sven On Wed, Jul 11, 2018 at 6:40 AM Billich Heinrich Rainer (PSI) > wrote: Hello, I have two nodes which hang on ?mmshutdown?, in detail the command ?/sbin/rmmod mmfs26? hangs. I get kernel messages which I append below. I wonder if this looks familiar to somebody? Is it a known bug? I can avoid the issue if I reduce pagepool from 128G to 64G. Running ?systemctl stop gpfs? shows the same issue. It forcefully terminates after a while, but ?rmmod? stays stuck. Two functions cxiReleaseAndForgetPages and put_page seem to be involved, the first part of gpfs, the second a kernel call. The servers have 256G memory and 72 (virtual) cores each. I run 5.0.1-1 on RHEL7.4 with kernel 3.10.0-693.17.1.el7.x86_64. I can try to switch back to 5.0.0 Thank you & kind regards, Heiner Jul 11 14:12:04 node-1.x.y mmremote[1641]: Unloading module mmfs26 Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The Spectrum Scale service process not running on this node. Normal operation cannot be done Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [I] Event raised: The Spectrum Scale service process is running Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The node is not able to form a quorum with the other available nodes. Jul 11 14:12:38 node-1.x.y sshd[2826]: Connection closed by xxx port 52814 [preauth] Jul 11 14:12:41 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [rmmod:2695] Jul 11 14:12:41 node-1.x.y kernel: Modules linked in: mmfs26(OE-) mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect Jul 11 14:12:41 node-1.x.y kernel: sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] Jul 11 14:12:41 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 Jul 11 14:12:41 node-1.x.y kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Jul 11 14:12:41 node-1.x.y kernel: task: ffff8808c4814f10 ti: ffff881619778000 task.ti: ffff881619778000 Jul 11 14:12:41 node-1.x.y kernel: RIP: 0010:[] [] put_compound_page+0xc3/0x174 Jul 11 14:12:41 node-1.x.y kernel: RSP: 0018:ffff88161977bd50 EFLAGS: 00000246 Jul 11 14:12:41 node-1.x.y kernel: RAX: 0000000000000283 RBX: 00000000fae3d201 RCX: 0000000000000284 Jul 11 14:12:41 node-1.x.y kernel: RDX: 0000000000000283 RSI: 0000000000000246 RDI: ffffea003d478000 Jul 11 14:12:41 node-1.x.y kernel: RBP: ffff88161977bd68 R08: ffff881ffae3d1e0 R09: 0000000180800059 Jul 11 14:12:41 node-1.x.y kernel: R10: 00000000fae3d201 R11: ffffea007feb8f40 R12: 00000000fae3d201 Jul 11 14:12:41 node-1.x.y kernel: R13: ffff88161977bd40 R14: 0000000000000000 R15: ffff88161977bd40 Jul 11 14:12:41 node-1.x.y kernel: FS: 00007f81a1db0740(0000) GS:ffff883ffee80000(0000) knlGS:0000000000000000 Jul 11 14:12:41 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 11 14:12:41 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: 0000000c36b2c000 CR4: 00000000001607e0 Jul 11 14:12:41 node-1.x.y kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 11 14:12:41 node-1.x.y kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 11 14:12:41 node-1.x.y kernel: Call Trace: Jul 11 14:12:41 node-1.x.y kernel: [] put_page+0x45/0x50 Jul 11 14:12:41 node-1.x.y kernel: [] cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] ? kmem_cache_free+0x1e2/0x200 Jul 11 14:12:41 node-1.x.y kernel: [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:12:41 node-1.x.y kernel: [] cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:12:41 node-1.x.y kernel: [] SyS_delete_module+0x19b/0x300 Jul 11 14:12:41 node-1.x.y kernel: [] system_call_fastpath+0x16/0x1b Jul 11 14:12:41 node-1.x.y kernel: Code: d1 00 00 00 4c 89 e7 e8 3a ff ff ff e9 c4 00 00 00 4c 39 e3 74 c1 41 8b 54 24 1c 85 d2 74 b8 8d 4a 01 89 d0 f0 41 0f b1 4c 24 1c <39> c2 74 04 89 c2 eb e8 e8 f3 f0 ae ff 49 89 c5 f0 41 0f ba 2c Jul 11 14:13:23 node-1.x.y systemd[1]: gpfs.service stopping timed out. Terminating. Jul 11 14:13:27 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 21s! [rmmod:2695] Jul 11 14:13:27 node-1.x.y kernel: Modules linked in: mmfs26(OE-) mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler Jul 11 14:13:27 node-1.x.y kernel: INFO: rcu_sched detected stalls on CPUs/tasks: Jul 11 14:13:27 node-1.x.y kernel: { Jul 11 14:13:27 node-1.x.y kernel: 28 Jul 11 14:13:27 node-1.x.y kernel: } Jul 11 14:13:27 node-1.x.y kernel: (detected by 17, t=60002 jiffies, g=267734, c=267733, q=36089) Jul 11 14:13:27 node-1.x.y kernel: Task dump for CPU 28: Jul 11 14:13:27 node-1.x.y kernel: rmmod R Jul 11 14:13:27 node-1.x.y kernel: running task Jul 11 14:13:27 node-1.x.y kernel: 0 2695 2642 0x00000008 Jul 11 14:13:27 node-1.x.y kernel: Call Trace: Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? __free_slab+0xdc/0x200 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? __put_compound_page+0x22/0x22 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? put_page+0x45/0x50 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? SyS_delete_module+0x19b/0x300 Jul 11 14:13:27 node-1.x.y kernel: Jul 11 14:13:27 node-1.x.y kernel: [] ? system_call_fastpath+0x16/0x1b Jul 11 14:13:27 node-1.x.y kernel: acpi_power_meter Jul 11 14:13:27 node-1.x.y kernel: binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] Jul 11 14:13:27 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 Jul 11 14:13:27 node-1.x.y kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Jul 11 14:13:27 node-1.x.y kernel: task: ffff8808c4814f10 ti: ffff881619778000 task.ti: ffff881619778000 Jul 11 14:13:27 node-1.x.y kernel: RIP: 0010:[] [] __put_compound_page+0x22/0x22 Jul 11 14:13:27 node-1.x.y kernel: RSP: 0018:ffff88161977bd70 EFLAGS: 00000282 Jul 11 14:13:27 node-1.x.y kernel: RAX: 002fffff00008010 RBX: 0000000000000135 RCX: 00000000000001c1 Jul 11 14:13:27 node-1.x.y kernel: RDX: ffff8814adbbf000 RSI: 0000000000000246 RDI: ffffea00650e7040 Jul 11 14:13:27 node-1.x.y kernel: RBP: ffff88161977bd78 R08: ffff881ffae3df60 R09: 0000000180800052 Jul 11 14:13:27 node-1.x.y kernel: R10: 00000000fae3db01 R11: ffffea007feb8f40 R12: ffff881ffae3df60 Jul 11 14:13:27 node-1.x.y kernel: R13: 0000000180800052 R14: 00000000fae3db01 R15: ffffea007feb8f40 Jul 11 14:13:27 node-1.x.y kernel: FS: 00007f81a1db0740(0000) GS:ffff883ffee80000(0000) knlGS:0000000000000000 Jul 11 14:13:27 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 11 14:13:27 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: 0000000c36b2c000 CR4: 00000000001607e0 Jul 11 14:13:27 node-1.x.y kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 11 14:13:27 node-1.x.y kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 11 14:13:27 node-1.x.y kernel: Call Trace: Jul 11 14:13:27 node-1.x.y kernel: [] ? put_page+0x45/0x50 Jul 11 14:13:27 node-1.x.y kernel: [] cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cxiDeallocPageList+0x45/0x110 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] mmfs+0xc85/0xca0 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] gpfs_clean+0x26/0x30 [mmfslinux] Jul 11 14:13:27 node-1.x.y kernel: [] cleanup_module+0x25/0x30 [mmfs26] Jul 11 14:13:27 node-1.x.y kernel: [] SyS_delete_module+0x19b/0x300 Jul 11 14:13:27 node-1.x.y kernel: [] system_call_fastpath+0x16/0x1b Jul 11 14:13:27 node-1.x.y kernel: Code: c0 0f 95 c0 0f b6 c0 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 8b 07 48 89 fb a8 20 74 05 e8 0c f8 ae ff 48 89 df ff 53 60 5b 5d c3 <0f> 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 07 48 89 fb f6 -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Jul 12 14:40:15 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 12 Jul 2018 06:40:15 -0700 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> Message-ID: if that happens it would be interesting what top reports start top in a large resolution window (like 330x80) , press shift-H , this will break it down per Thread, also press 1 to have a list of each cpu individually and see if you can either spot one core on the top list with 0% idle or on the thread list on the bottom if any of the threads run at 100% core speed. attached is a screenshot which columns to look at , this system is idle, so nothing to see, just to show you where to look does this machine by any chance has either large maxfilestochache or is a token server ? [image: image.png] sven On Thu, Jul 12, 2018 at 6:30 AM Billich Heinrich Rainer (PSI) < heiner.billich at psi.ch> wrote: > Hello Sven, > > > > Thank you. I did enable numaMemorInterleave but the issues stays. > > > > In the meantime I switched to version 5.0.0-2 just to see if it?s version > dependent ? it?s not. All gpfs filesystems are unmounted when this happens. > > > > At shutdown I often need to do a hard reset to force a reboot ? o.k., I > never waited more than 5 minutes once I saw a hang, maybe it would recover > after some more time. > > > > ?rmmod mmfs26? doesn?t hang all the times, maybe at every other shutdown > or mmstartup/mmshutdown cycle. While rmmod hangs the system seems slow, > command like ?ps -efH? or ?history? take a long time and some mm commands > just block, a few times the system gets completely inaccessible. > > > > I?ll reinstall the systems and move back to 4.2.3-8 and see if this is a > stable configuration to start from an to rule out any hardware/BIOS issues. > > > > I append output from numactl -H below. > > > > Cheers, > > > > Heiner > > > > Test with 5.0.0-2 > > > > [root at xbl-ces-2 ~]# numactl -H > > available: 2 nodes (0-1) > > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 37 38 39 40 41 > 42 43 44 45 46 47 48 49 50 51 52 53 > > node 0 size: 130942 MB > > node 0 free: 60295 MB > > node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 54 55 > 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 > > node 1 size: 131072 MB > > node 1 free: 60042 MB > > node distances: > > node 0 1 > > 0: 10 21 > > 1: 21 10 > > > > [root at xbl-ces-2 ~]# mmdiag --config | grep numaM > > ! numaMemoryInterleave yes > > > > # cat /proc/cmdline > > BOOT_IMAGE=/vmlinuz-3.10.0-693.17.1.el7.x86_64 > root=/dev/mapper/vg_root-lv_root ro crashkernel=auto rd.lvm.lv=vg_root/lv_root > console=tty0 console=ttyS0,115200 nosmap > > > > > > Example output of ps -efH during mmshutdown when rmmod did hang (last > line) This is with 5.0.0-2. As I see all gpfs processe already terminated, > just > > > > root 1 0 0 14:30 ? 00:00:10 /usr/lib/systemd/systemd > --switched-root --system --deserialize 21 > > root 1035 1 0 14:30 ? 00:00:02 > /usr/lib/systemd/systemd-journald > > root 1055 1 0 14:30 ? 00:00:00 /usr/sbin/lvmetad -f > > root 1072 1 0 14:30 ? 00:00:11 > /usr/lib/systemd/systemd-udevd > > root 1478 1 0 14:31 ? 00:00:00 /usr/sbin/sssd -i -f > > root 1484 1478 0 14:31 ? 00:00:00 > /usr/libexec/sssd/sssd_be --domain D.PSI.CH --uid 0 --gid 0 > --debug-to-files > > root 1486 1478 0 14:31 ? 00:00:00 > /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files > > root 1487 1478 0 14:31 ? 00:00:00 > /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files > > root 1479 1 0 14:31 ? 00:00:00 /usr/sbin/rasdaemon -f -r > > root 1482 1 0 14:31 ? 00:00:04 /usr/sbin/irqbalance > --foreground > > dbus 1483 1 0 14:31 ? 00:00:00 /bin/dbus-daemon > --system --address=systemd: --nofork --nopidfile --systemd-activation > > root 1496 1 0 14:31 ? 00:00:00 /usr/sbin/smartd -n -q > never > > root 1498 1 0 14:31 ? 00:00:00 /usr/sbin/gssproxy -D > > nscd 1507 1 0 14:31 ? 00:00:01 /usr/sbin/nscd > > nrpe 1526 1 0 14:31 ? 00:00:00 /usr/sbin/nrpe -c > /etc/nagios/nrpe.cfg -d > > root 1531 1 0 14:31 ? 00:00:00 > /usr/lib/systemd/systemd-logind > > root 1533 1 0 14:31 ? 00:00:00 /usr/sbin/rpc.gssd > > root 1803 1 0 14:31 ttyS0 00:00:00 /sbin/agetty --keep-baud > 115200 38400 9600 ttyS0 vt220 > > root 1804 1 0 14:31 tty1 00:00:00 /sbin/agetty --noclear > tty1 linux > > root 2405 1 0 14:32 ? 00:00:00 /sbin/dhclient -q -cf > /etc/dhcp/dhclient-ib0.conf -lf /var/lib/dhclient/dhclient--ib0.l > > root 2461 1 0 14:32 ? 00:00:00 /usr/sbin/sshd -D > > root 11561 2461 0 14:35 ? 00:00:00 sshd: root at pts/0 > > root 11565 11561 0 14:35 pts/0 00:00:00 -bash > > root 16024 11565 0 14:50 pts/0 00:00:05 ps -efH > > root 11609 2461 0 14:35 ? 00:00:00 sshd: root at pts/1 > > root 11644 11609 0 14:35 pts/1 00:00:00 -bash > > root 2718 1 0 14:32 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh > /usr/lpp/mmfs/bin/mmccrmonitor 15 0 no > > root 2758 1 0 14:32 ? 00:00:00 > /usr/libexec/postfix/master -w > > postfix 2785 2758 0 14:32 ? 00:00:00 pickup -l -t unix -u > > postfix 2786 2758 0 14:32 ? 00:00:00 qmgr -l -t unix -u > > root 3174 1 0 14:32 ? 00:00:00 /usr/sbin/crond -n > > ntp 3179 1 0 14:32 ? 00:00:00 /usr/sbin/ntpd -u > ntp:ntp -g > > root 3915 1 3 14:32 ? 00:00:33 python > /usr/lpp/mmfs/bin/mmsysmon.py > > root 13618 1 0 14:36 ? 00:00:00 > /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 8192 yes > no > > root 15936 1 0 14:49 pts/1 00:00:00 /usr/lpp/mmfs/bin/mmksh > /usr/lpp/mmfs/bin/runmmfs > > root 15992 15936 0 14:49 pts/1 00:00:00 /sbin/rmmod mmfs26 > > > > -- > > Paul Scherrer Institut > > Science IT > > Heiner Billich > > WHGA 106 > > CH 5232 Villigen PSI > > 056 310 36 02 > > https://www.psi.ch > > > > > > *From: * on behalf of Sven > Oehme > *Reply-To: *gpfsug main discussion list > *Date: *Wednesday 11 July 2018 at 15:47 > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown > > > > Hi, > > > > what does numactl -H report ? > > > > also check if this is set to yes : > > > > root at fab3a:~# mmlsconfig numaMemoryInterleave > > numaMemoryInterleave yes > > > > Sven > > > > On Wed, Jul 11, 2018 at 6:40 AM Billich Heinrich Rainer (PSI) < > heiner.billich at psi.ch> wrote: > > Hello, > > > > I have two nodes which hang on ?mmshutdown?, in detail the command > ?/sbin/rmmod mmfs26? hangs. I get kernel messages which I append below. I > wonder if this looks familiar to somebody? Is it a known bug? I can avoid > the issue if I reduce pagepool from 128G to 64G. > > > > Running ?systemctl stop gpfs? shows the same issue. It forcefully > terminates after a while, but ?rmmod? stays stuck. > > > > Two functions cxiReleaseAndForgetPages and put_page seem to be involved, > the first part of gpfs, the second a kernel call. > > > > The servers have 256G memory and 72 (virtual) cores each. > > I run 5.0.1-1 on RHEL7.4 with kernel 3.10.0-693.17.1.el7.x86_64. > > > > I can try to switch back to 5.0.0 > > > > Thank you & kind regards, > > > > Heiner > > > > > > > > Jul 11 14:12:04 node-1.x.y mmremote[1641]: Unloading module mmfs26 > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The Spectrum > Scale service process not running on this node. Normal operation cannot be > done > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [I] Event raised: The Spectrum > Scale service process is running > > Jul 11 14:12:04 node-1.x.y mmsysmon[2440]: [E] Event raised: The node is > not able to form a quorum with the other available nodes. > > Jul 11 14:12:38 node-1.x.y sshd[2826]: Connection closed by xxx port 52814 > [preauth] > > > > Jul 11 14:12:41 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 > stuck for 23s! [rmmod:2695] > > > > Jul 11 14:12:41 node-1.x.y kernel: Modules linked in: mmfs26(OE-) > mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) > rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) > mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) > mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp > coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt > pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler > acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc > ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 > i2c_algo_bit drm_kms_helper syscopyarea sysfillrect > > Jul 11 14:12:41 node-1.x.y kernel: sysimgblt fb_sys_fops ttm ixgbe > mlx4_core(OE) crct10dif_pclmul mdio mlx_compat(OE) crct10dif_common drm ptp > crc32c_intel devlink hpsa pps_core i2c_core scsi_transport_sas dca > dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tracedev] > > Jul 11 14:12:41 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: > G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > Jul 11 14:12:41 node-1.x.y kernel: Hardware name: HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Jul 11 14:12:41 node-1.x.y kernel: task: ffff8808c4814f10 ti: > ffff881619778000 task.ti: ffff881619778000 > > Jul 11 14:12:41 node-1.x.y kernel: RIP: 0010:[] > [] put_compound_page+0xc3/0x174 > > Jul 11 14:12:41 node-1.x.y kernel: RSP: 0018:ffff88161977bd50 EFLAGS: > 00000246 > > Jul 11 14:12:41 node-1.x.y kernel: RAX: 0000000000000283 RBX: > 00000000fae3d201 RCX: 0000000000000284 > > Jul 11 14:12:41 node-1.x.y kernel: RDX: 0000000000000283 RSI: > 0000000000000246 RDI: ffffea003d478000 > > Jul 11 14:12:41 node-1.x.y kernel: RBP: ffff88161977bd68 R08: > ffff881ffae3d1e0 R09: 0000000180800059 > > Jul 11 14:12:41 node-1.x.y kernel: R10: 00000000fae3d201 R11: > ffffea007feb8f40 R12: 00000000fae3d201 > > Jul 11 14:12:41 node-1.x.y kernel: R13: ffff88161977bd40 R14: > 0000000000000000 R15: ffff88161977bd40 > > Jul 11 14:12:41 node-1.x.y kernel: FS: 00007f81a1db0740(0000) > GS:ffff883ffee80000(0000) knlGS:0000000000000000 > > Jul 11 14:12:41 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Jul 11 14:12:41 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: > 0000000c36b2c000 CR4: 00000000001607e0 > > Jul 11 14:12:41 node-1.x.y kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Jul 11 14:12:41 node-1.x.y kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > > > > Jul 11 14:12:41 node-1.x.y kernel: Call Trace: > > Jul 11 14:12:41 node-1.x.y kernel: [] put_page+0x45/0x50 > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] ? > kmem_cache_free+0x1e2/0x200 > > Jul 11 14:12:41 node-1.x.y kernel: [] > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] mmfs+0xc85/0xca0 > [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:12:41 node-1.x.y kernel: [] > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:12:41 node-1.x.y kernel: [] > SyS_delete_module+0x19b/0x300 > > Jul 11 14:12:41 node-1.x.y kernel: [] > system_call_fastpath+0x16/0x1b > > Jul 11 14:12:41 node-1.x.y kernel: Code: d1 00 00 00 4c 89 e7 e8 3a ff ff > ff e9 c4 00 00 00 4c 39 e3 74 c1 41 8b 54 24 1c 85 d2 74 b8 8d 4a 01 89 d0 > f0 41 0f b1 4c 24 1c <39> c2 74 04 89 c2 eb e8 e8 f3 f0 ae ff 49 89 c5 f0 > 41 0f ba 2c > > > > Jul 11 14:13:23 node-1.x.y systemd[1]: gpfs.service stopping timed out. > Terminating. > > > > Jul 11 14:13:27 node-1.x.y kernel: NMI watchdog: BUG: soft lockup - CPU#28 > stuck for 21s! [rmmod:2695] > > > > Jul 11 14:13:27 node-1.x.y kernel: Modules linked in: mmfs26(OE-) > mmfslinux(OE) tracedev(OE) tcp_diag inet_diag rdma_ucm(OE) ib_ucm(OE) > rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) > mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) > mlx4_ib(OE) ib_core(OE) vfat fat ext4 sb_edac edac_core intel_powerclamp > coretemp intel_rapl iosf_mbi mbcache jbd2 kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > iTCO_wdt iTCO_vendor_support ipmi_ssif pcc_cpufreq hpilo ipmi_si sg hpwdt > pcspkr i2c_i801 lpc_ich ipmi_devintf wmi ioatdma shpchp ipmi_msghandler > > Jul 11 14:13:27 node-1.x.y kernel: INFO: rcu_sched detected stalls on > CPUs/tasks: > > Jul 11 14:13:27 node-1.x.y kernel: { > > Jul 11 14:13:27 node-1.x.y kernel: 28 > > Jul 11 14:13:27 node-1.x.y kernel: } > > Jul 11 14:13:27 node-1.x.y kernel: (detected by 17, t=60002 jiffies, > g=267734, c=267733, q=36089) > > Jul 11 14:13:27 node-1.x.y kernel: Task dump for CPU 28: > > Jul 11 14:13:27 node-1.x.y kernel: rmmod R > > Jul 11 14:13:27 node-1.x.y kernel: running task > > Jul 11 14:13:27 node-1.x.y kernel: 0 2695 2642 0x00000008 > > Jul 11 14:13:27 node-1.x.y kernel: Call Trace: > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > __free_slab+0xdc/0x200 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > __put_compound_page+0x22/0x22 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > put_page+0x45/0x50 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > mmfs+0xc85/0xca0 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > SyS_delete_module+0x19b/0x300 > > Jul 11 14:13:27 node-1.x.y kernel: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > system_call_fastpath+0x16/0x1b > > Jul 11 14:13:27 node-1.x.y kernel: acpi_power_meter > > Jul 11 14:13:27 node-1.x.y kernel: binfmt_misc nfsd auth_rpcgss nfs_acl > lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif > crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops ttm ixgbe mlx4_core(OE) crct10dif_pclmul > mdio mlx_compat(OE) crct10dif_common drm ptp crc32c_intel devlink hpsa > pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log > dm_mod [last unloaded: tracedev] > > Jul 11 14:13:27 node-1.x.y kernel: CPU: 28 PID: 2695 Comm: rmmod Tainted: > G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > Jul 11 14:13:27 node-1.x.y kernel: Hardware name: HP ProLiant DL380 > Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Jul 11 14:13:27 node-1.x.y kernel: task: ffff8808c4814f10 ti: > ffff881619778000 task.ti: ffff881619778000 > > Jul 11 14:13:27 node-1.x.y kernel: RIP: 0010:[] > [] __put_compound_page+0x22/0x22 > > Jul 11 14:13:27 node-1.x.y kernel: RSP: 0018:ffff88161977bd70 EFLAGS: > 00000282 > > Jul 11 14:13:27 node-1.x.y kernel: RAX: 002fffff00008010 RBX: > 0000000000000135 RCX: 00000000000001c1 > > Jul 11 14:13:27 node-1.x.y kernel: RDX: ffff8814adbbf000 RSI: > 0000000000000246 RDI: ffffea00650e7040 > > Jul 11 14:13:27 node-1.x.y kernel: RBP: ffff88161977bd78 R08: > ffff881ffae3df60 R09: 0000000180800052 > > Jul 11 14:13:27 node-1.x.y kernel: R10: 00000000fae3db01 R11: > ffffea007feb8f40 R12: ffff881ffae3df60 > > Jul 11 14:13:27 node-1.x.y kernel: R13: 0000000180800052 R14: > 00000000fae3db01 R15: ffffea007feb8f40 > > Jul 11 14:13:27 node-1.x.y kernel: FS: 00007f81a1db0740(0000) > GS:ffff883ffee80000(0000) knlGS:0000000000000000 > > Jul 11 14:13:27 node-1.x.y kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Jul 11 14:13:27 node-1.x.y kernel: CR2: 00007fa96e38f980 CR3: > 0000000c36b2c000 CR4: 00000000001607e0 > > Jul 11 14:13:27 node-1.x.y kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Jul 11 14:13:27 node-1.x.y kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > > Jul 11 14:13:27 node-1.x.y kernel: Call Trace: > > Jul 11 14:13:27 node-1.x.y kernel: [] ? > put_page+0x45/0x50 > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiReleaseAndForgetPages+0xb2/0x1c0 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiDeallocPageList+0x45/0x110 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] mmfs+0xc85/0xca0 > [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] > gpfs_clean+0x26/0x30 [mmfslinux] > > Jul 11 14:13:27 node-1.x.y kernel: [] > cleanup_module+0x25/0x30 [mmfs26] > > Jul 11 14:13:27 node-1.x.y kernel: [] > SyS_delete_module+0x19b/0x300 > > Jul 11 14:13:27 node-1.x.y kernel: [] > system_call_fastpath+0x16/0x1b > > Jul 11 14:13:27 node-1.x.y kernel: Code: c0 0f 95 c0 0f b6 c0 5d c3 0f 1f > 44 00 00 55 48 89 e5 53 48 8b 07 48 89 fb a8 20 74 05 e8 0c f8 ae ff 48 89 > df ff 53 60 5b 5d c3 <0f> 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 07 > 48 89 fb f6 > > > > -- > > Paul Scherrer Institut > > Science IT > > Heiner Billich > > WHGA 106 > > CH 5232 Villigen PSI > > 056 310 36 02 > > https://www.psi.ch > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 643176 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Jul 12 15:47:00 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 12 Jul 2018 10:47:00 -0400 Subject: [gpfsug-discuss] File placement rule for new files in directory - PATH_NAME In-Reply-To: References: <4fc216a9-d721-4bd5-76f6-2476dae2c22d@img.cas.cz><8EE9E4B1-D6BC-4F49-9F12-8936BBACAF3E@bham.ac.uk><3cf065b3-383b-d1ff-1a33-3cc4b5845274@img.cas.cz> Message-ID: Why no path name in SET POOL rule? Maybe more than one reason, but consider, that in Unix, the API has the concept of "current directory" and "create a file in the current directory" AND another process or thread may at any time rename (mv!) any directory... So even it you "think" you know the name of the directory in which you are creating a file, you really don't know for sure! So, you may ask, how does the command /bin/pwd work? It follows the parent inode field of each inode, searches the parent for a matching inode, stashes the name in a buffer... When it reaches the root, it prints out the apparent path it found to the root... Which could be wrong by the time it reaches the root! For example: [root@~/gpfs-git]$mkdir -p /tmp/a/b/c/d [root@~/gpfs-git]$cd /tmp/a/b/c/d [root at .../c/d]$/bin/pwd /tmp/a/b/c/d [root at .../c/d]$pwd /tmp/a/b/c/d [root at .../c/d]$mv /tmp/a/b /tmp/a/b2 [root at .../c/d]$pwd /tmp/a/b/c/d # Bash still "thinks" it is in /tmp/a/b/c/d [root at .../c/d]$/bin/pwd /tmp/a/b2/c/d # But /bin/pwd knows better -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Thu Jul 12 16:21:50 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 12 Jul 2018 15:21:50 +0000 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> Message-ID: <68AA932A-5A53-4EA8-879E-A843783DF0F4@psi.ch> Hello Sven, The machine has maxFilesToCache 204800 (2M) it will become a CES node, hence the higher than default value. It?s just a 3 node cluster with remote cluster mount and no activity (yet). But all three nodes are listed as token server by ?mmdiag ?tokenmgr?. Top showed 100% idle on core 55. This matches the kernel messages about rmmod being stuck on core 55. I didn?t see a dominating thread/process, but many kernel threads showed 30-40% CPU, in sum that used about 50% of all cpu available. This time mmshutdown did return and left the module loaded, next mmstartup tried to remove the ?old? module and got stuck :-( I append two links to screenshots Thank you, Heiner https://pasteboard.co/Hu86DKf.png https://pasteboard.co/Hu86rg4.png If the links don?t work I can post the images to the list. Kernel messages: [ 857.791050] CPU: 55 PID: 16429 Comm: rmmod Tainted: G W OEL ------------ 3.10.0-693.17.1.el7.x86_64 #1 [ 857.842265] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 [ 857.884938] task: ffff883ffafe8fd0 ti: ffff88342af30000 task.ti: ffff88342af30000 [ 857.924120] RIP: 0010:[] [] compound_unlock_irqrestore+0xe/0x20 [ 857.970708] RSP: 0018:ffff88342af33d38 EFLAGS: 00000246 [ 857.999742] RAX: 0000000000000000 RBX: ffff88207ffda068 RCX: 00000000000000e5 [ 858.037165] RDX: 0000000000000246 RSI: 0000000000000246 RDI: 0000000000000246 [ 858.074416] RBP: ffff88342af33d38 R08: 0000000000000000 R09: 0000000000000000 [ 858.111519] R10: ffff88207ffcfac0 R11: ffffea00fff40280 R12: 0000000000000200 [ 858.148421] R13: 00000001fff40280 R14: ffffffff8118cd84 R15: ffff88342af33ce8 [ 858.185845] FS: 00007fc797d1e740(0000) GS:ffff883fff0c0000(0000) knlGS:0000000000000000 [ 858.227062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 858.257819] CR2: 00000000004116d0 CR3: 0000003fc2ec0000 CR4: 00000000001607e0 [ 858.295143] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 858.332145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 858.369097] Call Trace: [ 858.384829] [] put_compound_page+0x149/0x174 [ 858.416176] [] put_page+0x45/0x50 [ 858.443185] [] cxiReleaseAndForgetPages+0xda/0x220 [mmfslinux] [ 858.481751] [] ? cxiDeallocPageList+0xbd/0x110 [mmfslinux] [ 858.518206] [] cxiDeallocPageList+0x45/0x110 [mmfslinux] [ 858.554438] [] ? _raw_spin_lock+0x10/0x30 [ 858.585522] [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] [ 858.622670] [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] [ 858.659246] [] mmfs+0xc85/0xca0 [mmfs26] [ 858.689379] [] gpfs_clean+0x26/0x30 [mmfslinux] [ 858.722330] [] cleanup_module+0x25/0x30 [mmfs26] [ 858.755431] [] SyS_delete_module+0x19b/0x300 [ 858.786882] [] system_call_fastpath+0x16/0x1b [ 858.818776] Code: 89 ca 44 89 c1 4c 8d 43 10 e8 6f 2b ff ff 89 c2 48 89 13 5b 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 f0 80 67 03 fe 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 [ 859.068528] hrtimer: interrupt took 2877171 ns [ 870.517924] INFO: rcu_sched self-detected stall on CPU { 55} (t=240003 jiffies g=18437 c=18436 q=194992) [ 870.577882] Task dump for CPU 55: [ 870.602837] rmmod R running task 0 16429 16374 0x00000008 [ 870.645206] Call Trace: [ 870.666388] [] sched_show_task+0xa8/0x110 [ 870.704271] [] dump_cpu_task+0x39/0x70 [ 870.738421] [] rcu_dump_cpu_stacks+0x90/0xd0 [ 870.775339] [] rcu_check_callbacks+0x442/0x730 [ 870.812353] [] ? tick_sched_do_timer+0x50/0x50 [ 870.848875] [] update_process_times+0x46/0x80 [ 870.884847] [] tick_sched_handle+0x30/0x70 [ 870.919740] [] tick_sched_timer+0x39/0x80 [ 870.953660] [] __hrtimer_run_queues+0xd4/0x260 [ 870.989276] [] hrtimer_interrupt+0xaf/0x1d0 [ 871.023481] [] local_apic_timer_interrupt+0x35/0x60 [ 871.061233] [] smp_apic_timer_interrupt+0x3d/0x50 [ 871.097838] [] apic_timer_interrupt+0x232/0x240 [ 871.133232] [] ? put_page_testzero+0x8/0x15 [ 871.170089] [] put_compound_page+0x151/0x174 [ 871.204221] [] put_page+0x45/0x50 [ 871.234554] [] cxiReleaseAndForgetPages+0xda/0x220 [mmfslinux] [ 871.275763] [] ? cxiDeallocPageList+0xbd/0x110 [mmfslinux] [ 871.316987] [] cxiDeallocPageList+0x45/0x110 [mmfslinux] [ 871.356886] [] ? _raw_spin_lock+0x10/0x30 [ 871.389455] [] cxiFreeSharedMemory+0x12a/0x130 [mmfslinux] [ 871.429784] [] kxFreeAllSharedMemory+0xe2/0x160 [mmfs26] [ 871.468753] [] mmfs+0xc85/0xca0 [mmfs26] [ 871.501196] [] gpfs_clean+0x26/0x30 [mmfslinux] [ 871.536562] [] cleanup_module+0x25/0x30 [mmfs26] [ 871.572110] [] SyS_delete_module+0x19b/0x300 [ 871.606048] [] system_call_fastpath+0x16/0x1b -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Thursday 12 July 2018 at 15:42 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown if that happens it would be interesting what top reports start top in a large resolution window (like 330x80) , press shift-H , this will break it down per Thread, also press 1 to have a list of each cpu individually and see if you can either spot one core on the top list with 0% idle or on the thread list on the bottom if any of the threads run at 100% core speed. attached is a screenshot which columns to look at , this system is idle, so nothing to see, just to show you where to look does this machine by any chance has either large maxfilestochache or is a token server ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Jul 12 16:30:43 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 12 Jul 2018 08:30:43 -0700 Subject: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown In-Reply-To: <68AA932A-5A53-4EA8-879E-A843783DF0F4@psi.ch> References: <8DBE3A16-7FAA-4961-B875-D79C60D051A1@psi.ch> <76DE62CB-0E55-417F-B041-77B2ABE6606D@psi.ch> <68AA932A-5A53-4EA8-879E-A843783DF0F4@psi.ch> Message-ID: Hi, the problem is the cleanup of the tokens and/or the openfile objects. i suggest you open a defect for this. sven On Thu, Jul 12, 2018 at 8:22 AM Billich Heinrich Rainer (PSI) < heiner.billich at psi.ch> wrote: > > > > > Hello Sven, > > > > The machine has > > > > maxFilesToCache 204800 (2M) > > > > it will become a CES node, hence the higher than default value. It?s just > a 3 node cluster with remote cluster mount and no activity (yet). But all > three nodes are listed as token server by ?mmdiag ?tokenmgr?. > > > > Top showed 100% idle on core 55. This matches the kernel messages about > rmmod being stuck on core 55. > > I didn?t see a dominating thread/process, but many kernel threads showed > 30-40% CPU, in sum that used about 50% of all cpu available. > > > > This time mmshutdown did return and left the module loaded, next mmstartup > tried to remove the ?old? module and got stuck :-( > > > > I append two links to screenshots > > > > Thank you, > > > > Heiner > > > > https://pasteboard.co/Hu86DKf.png > > https://pasteboard.co/Hu86rg4.png > > > > If the links don?t work I can post the images to the list. > > > > Kernel messages: > > > > [ 857.791050] CPU: 55 PID: 16429 Comm: rmmod Tainted: G W OEL > ------------ 3.10.0-693.17.1.el7.x86_64 #1 > > [ 857.842265] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, > BIOS P89 01/22/2018 > > [ 857.884938] task: ffff883ffafe8fd0 ti: ffff88342af30000 task.ti: > ffff88342af30000 > > [ 857.924120] RIP: 0010:[] [] > compound_unlock_irqrestore+0xe/0x20 > > [ 857.970708] RSP: 0018:ffff88342af33d38 EFLAGS: 00000246 > > [ 857.999742] RAX: 0000000000000000 RBX: ffff88207ffda068 RCX: > 00000000000000e5 > > [ 858.037165] RDX: 0000000000000246 RSI: 0000000000000246 RDI: > 0000000000000246 > > [ 858.074416] RBP: ffff88342af33d38 R08: 0000000000000000 R09: > 0000000000000000 > > [ 858.111519] R10: ffff88207ffcfac0 R11: ffffea00fff40280 R12: > 0000000000000200 > > [ 858.148421] R13: 00000001fff40280 R14: ffffffff8118cd84 R15: > ffff88342af33ce8 > > [ 858.185845] FS: 00007fc797d1e740(0000) GS:ffff883fff0c0000(0000) > knlGS:0000000000000000 > > [ 858.227062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 858.257819] CR2: 00000000004116d0 CR3: 0000003fc2ec0000 CR4: > 00000000001607e0 > > [ 858.295143] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > > [ 858.332145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > > [ 858.369097] Call Trace: > > [ 858.384829] [] put_compound_page+0x149/0x174 > > [ 858.416176] [] put_page+0x45/0x50 > > [ 858.443185] [] cxiReleaseAndForgetPages+0xda/0x220 > [mmfslinux] > > [ 858.481751] [] ? cxiDeallocPageList+0xbd/0x110 > [mmfslinux] > > [ 858.518206] [] cxiDeallocPageList+0x45/0x110 > [mmfslinux] > > [ 858.554438] [] ? _raw_spin_lock+0x10/0x30 > > [ 858.585522] [] cxiFreeSharedMemory+0x12a/0x130 > [mmfslinux] > > [ 858.622670] [] kxFreeAllSharedMemory+0xe2/0x160 > [mmfs26] > > [ 858.659246] [] mmfs+0xc85/0xca0 [mmfs26] > > [ 858.689379] [] gpfs_clean+0x26/0x30 [mmfslinux] > > [ 858.722330] [] cleanup_module+0x25/0x30 [mmfs26] > > [ 858.755431] [] SyS_delete_module+0x19b/0x300 > > [ 858.786882] [] system_call_fastpath+0x16/0x1b > > [ 858.818776] Code: 89 ca 44 89 c1 4c 8d 43 10 e8 6f 2b ff ff 89 c2 48 89 > 13 5b 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 f0 80 67 03 fe 48 89 f7 57 9d > <0f> 1f 44 00 00 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 > > [ 859.068528] hrtimer: interrupt took 2877171 ns > > [ 870.517924] INFO: rcu_sched self-detected stall on CPU { 55} (t=240003 > jiffies g=18437 c=18436 q=194992) > > [ 870.577882] Task dump for CPU 55: > > [ 870.602837] rmmod R running task 0 16429 16374 > 0x00000008 > > [ 870.645206] Call Trace: > > [ 870.666388] [] sched_show_task+0xa8/0x110 > > [ 870.704271] [] dump_cpu_task+0x39/0x70 > > [ 870.738421] [] rcu_dump_cpu_stacks+0x90/0xd0 > > [ 870.775339] [] rcu_check_callbacks+0x442/0x730 > > [ 870.812353] [] ? tick_sched_do_timer+0x50/0x50 > > [ 870.848875] [] update_process_times+0x46/0x80 > > [ 870.884847] [] tick_sched_handle+0x30/0x70 > > [ 870.919740] [] tick_sched_timer+0x39/0x80 > > [ 870.953660] [] __hrtimer_run_queues+0xd4/0x260 > > [ 870.989276] [] hrtimer_interrupt+0xaf/0x1d0 > > [ 871.023481] [] local_apic_timer_interrupt+0x35/0x60 > > [ 871.061233] [] smp_apic_timer_interrupt+0x3d/0x50 > > [ 871.097838] [] apic_timer_interrupt+0x232/0x240 > > [ 871.133232] [] ? put_page_testzero+0x8/0x15 > > [ 871.170089] [] put_compound_page+0x151/0x174 > > [ 871.204221] [] put_page+0x45/0x50 > > [ 871.234554] [] cxiReleaseAndForgetPages+0xda/0x220 > [mmfslinux] > > [ 871.275763] [] ? cxiDeallocPageList+0xbd/0x110 > [mmfslinux] > > [ 871.316987] [] cxiDeallocPageList+0x45/0x110 > [mmfslinux] > > [ 871.356886] [] ? _raw_spin_lock+0x10/0x30 > > [ 871.389455] [] cxiFreeSharedMemory+0x12a/0x130 > [mmfslinux] > > [ 871.429784] [] kxFreeAllSharedMemory+0xe2/0x160 > [mmfs26] > > [ 871.468753] [] mmfs+0xc85/0xca0 [mmfs26] > > [ 871.501196] [] gpfs_clean+0x26/0x30 [mmfslinux] > > [ 871.536562] [] cleanup_module+0x25/0x30 [mmfs26] > > [ 871.572110] [] SyS_delete_module+0x19b/0x300 > > [ 871.606048] [] system_call_fastpath+0x16/0x1b > > > > -- > > Paul Scherrer Institut > > Science IT > > Heiner Billich > > WHGA 106 > > CH 5232 Villigen PSI > > 056 310 36 02 > > https://www.psi.ch > > > > > > *From: * on behalf of Sven > Oehme > > > *Reply-To: *gpfsug main discussion list > > *Date: *Thursday 12 July 2018 at 15:42 > > > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown > > > > if that happens it would be interesting what top reports > > > > start top in a large resolution window (like 330x80) , press shift-H , > this will break it down per Thread, also press 1 to have a list of each cpu > individually and see if you can either spot one core on the top list with > 0% idle or on the thread list on the bottom if any of the threads run at > 100% core speed. > > attached is a screenshot which columns to look at , this system is idle, > so nothing to see, just to show you where to look > > > > does this machine by any chance has either large maxfilestochache or is a > token server ? > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Fri Jul 13 11:07:25 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Fri, 13 Jul 2018 10:07:25 +0000 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data Message-ID: <83A6EEB0EC738F459A39439733AE80452672ADC8@MBX114.d.ethz.ch> Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes. I've the following perfmon configuration for the metric-group GPFSNSDDisk: { name = "GPFSNSDDisk" period = 2 restrict = "nsdNodes" }, that, as far as I know sends data to the collector every 2 seconds (correct ?). But how ? does it send what it reads from the counter every two seconds ? or does it aggregated in some way ? or what else ? In the collector node pmcollector, grafana-bridge and grafana-server run. Now I need to understand how to play with the grafana parameters: - Down sample (or Disable downsampling) - Aggregator (following on the same row the metrics). See attached picture 4s.png as reference. In the past I had the period set to 1. And grafana used to display correct data (bytes/s for the metric gpfs_nsdds_bytes_written) with aggregator set to "sum", which AFAIK means "sum all that metrics that match the filter below" (again see the attached picture to see how the filter is set to only collect data from the IO nodes). Today I've changed to "period=2"... and grafana started to display funny data rate (the double, or quad of the real rate). I had to play (almost randomly) with "Aggregator" (from sum to avg, which as fas as I undestand doesn't mean anything in my case... average between the two IO nodes ? or what ?) and "Down sample" (from empty to 2s, and then to 4s) to get back real data rate which is compliant with what I do get with dstat. Can someone kindly explain how to play with these parameters when zimon sensor's period is changed ? Many thanks in advance Regards, Alvise Dorigo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 4s.png Type: image/png Size: 129914 bytes Desc: 4s.png URL: From Kevin.Buterbaugh at Vanderbilt.Edu Sun Jul 15 18:24:43 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Sun, 15 Jul 2018 17:24:43 +0000 Subject: [gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace? Message-ID: Hi All, We are in a partial cluster downtime today to do firmware upgrades on our storage arrays. It is a partial downtime because we have two GPFS filesystems: 1. gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which I?ve unmounted across the cluster because it has data replication set to 1. 2. gpfs22 - 42 TB and which corresponds to /home. It has data replication set to two, so what we?re doing is ?mmchdisk gpfs22 suspend -d ?, then doing the firmware upgrade, and once the array is back we?re doing a ?mmchdisk gpfs22 resume -d ?, followed by ?mmchdisk gpfs22 start -d ?. On the 1st storage array this went very smoothly ? the mmchdisk took about 5 minutes, which is what I would expect. But on the 2nd storage array the mmchdisk appears to either be hung or proceeding at a glacial pace. For more than an hour it?s been stuck at: mmchdisk: Processing continues ... Scanning file system metadata, phase 1 ? There are no waiters of any significance and ?mmdiag ?iohist? doesn?t show any issues either. Any ideas, anyone? Unless I can figure this out I?m hosed for this downtime, as I?ve got 7 more arrays to do after this one! Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sun Jul 15 18:34:45 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Sun, 15 Jul 2018 17:34:45 +0000 Subject: [gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace? In-Reply-To: References: Message-ID: <9B63AEFC-0A19-4FA2-B04B-FCB066B7C9BD@nasa.gov> Hmm...have you dumped waiters across the entire cluster or just on the NSD servers/fs managers? Maybe there?s a slow node out there participating in the suspend effort? Might be worth running some quick tracing on the FS manager to see what it?s up to. On July 15, 2018 at 13:27:54 EDT, Buterbaugh, Kevin L wrote: Hi All, We are in a partial cluster downtime today to do firmware upgrades on our storage arrays. It is a partial downtime because we have two GPFS filesystems: 1. gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which I?ve unmounted across the cluster because it has data replication set to 1. 2. gpfs22 - 42 TB and which corresponds to /home. It has data replication set to two, so what we?re doing is ?mmchdisk gpfs22 suspend -d ?, then doing the firmware upgrade, and once the array is back we?re doing a ?mmchdisk gpfs22 resume -d ?, followed by ?mmchdisk gpfs22 start -d ?. On the 1st storage array this went very smoothly ? the mmchdisk took about 5 minutes, which is what I would expect. But on the 2nd storage array the mmchdisk appears to either be hung or proceeding at a glacial pace. For more than an hour it?s been stuck at: mmchdisk: Processing continues ... Scanning file system metadata, phase 1 ? There are no waiters of any significance and ?mmdiag ?iohist? doesn?t show any issues either. Any ideas, anyone? Unless I can figure this out I?m hosed for this downtime, as I?ve got 7 more arrays to do after this one! Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Sun Jul 15 20:11:26 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Sun, 15 Jul 2018 19:11:26 +0000 Subject: [gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace? In-Reply-To: <9B63AEFC-0A19-4FA2-B04B-FCB066B7C9BD@nasa.gov> References: <9B63AEFC-0A19-4FA2-B04B-FCB066B7C9BD@nasa.gov> Message-ID: <08D6C49B-298F-4DAA-8FF3-BDAA6D9CE8FE@vanderbilt.edu> Hi All, So I had noticed some waiters on my NSD servers that I thought were unrelated to the mmchdisk. However, I decided to try rebooting my NSD servers one at a time (mmshutdown failed!) to clear that up ? and evidently one of them had things hung up because the mmchdisk start completed. Thanks? Kevin On Jul 15, 2018, at 12:34 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] > wrote: Hmm...have you dumped waiters across the entire cluster or just on the NSD servers/fs managers? Maybe there?s a slow node out there participating in the suspend effort? Might be worth running some quick tracing on the FS manager to see what it?s up to. On July 15, 2018 at 13:27:54 EDT, Buterbaugh, Kevin L > wrote: Hi All, We are in a partial cluster downtime today to do firmware upgrades on our storage arrays. It is a partial downtime because we have two GPFS filesystems: 1. gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which I?ve unmounted across the cluster because it has data replication set to 1. 2. gpfs22 - 42 TB and which corresponds to /home. It has data replication set to two, so what we?re doing is ?mmchdisk gpfs22 suspend -d ?, then doing the firmware upgrade, and once the array is back we?re doing a ?mmchdisk gpfs22 resume -d ?, followed by ?mmchdisk gpfs22 start -d ?. On the 1st storage array this went very smoothly ? the mmchdisk took about 5 minutes, which is what I would expect. But on the 2nd storage array the mmchdisk appears to either be hung or proceeding at a glacial pace. For more than an hour it?s been stuck at: mmchdisk: Processing continues ... Scanning file system metadata, phase 1 ? There are no waiters of any significance and ?mmdiag ?iohist? doesn?t show any issues either. Any ideas, anyone? Unless I can figure this out I?m hosed for this downtime, as I?ve got 7 more arrays to do after this one! Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd518db52846a4be34e2208d5ea7a00d7%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636672732087040757&sdata=m77IpWNOlODc%2FzLiYI2qiPo9Azs8qsIdXSY8%2FoC6Nn0%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Peinkofer at lrz.de Thu Jul 19 15:05:39 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Thu, 19 Jul 2018 14:05:39 +0000 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster Message-ID: <03AEBEA7-B319-4DB1-A0D8-4A250038F8E7@lrz.de> Dear GPFS List, does anyone of you know, if it is possible to have multiple file systems in a GPFS Cluster that all are served primary via Ethernet but for which different ?booster? connections to various IB/OPA fabrics exist. For example let?s say in my central Storage/NSD Cluster, I implement two file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is served by NSD-C and NSD-D. Now I have two client Clusters C1 and C2 which have different OPA fabrics. Both Clusters can mount the two file systems via Ethernet, but I now add OPA connections for NSD-A and NSD-B to C1?s fabric and OPA connections for NSD-C and NSD-D to C2?s fabric and just switch on RDMA. As far as I understood, GPFS will use RDMA if it is available between two nodes but switch to Ethernet if RDMA is not available between the two nodes. So given just this, the above scenario could work in principle. But will it work in reality and will it be supported by IBM? Many thanks in advance. Best Regards, Stephan Peinkofer -- Stephan Peinkofer Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen URL: http://www.lrz.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Thu Jul 19 15:23:42 2018 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 19 Jul 2018 10:23:42 -0400 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster In-Reply-To: <03AEBEA7-B319-4DB1-A0D8-4A250038F8E7@lrz.de> References: <03AEBEA7-B319-4DB1-A0D8-4A250038F8E7@lrz.de> Message-ID: Hi Stephan: I think every node in C1 and in C2 have to see every node in the server cluster NSD-[AD]. We have a 10 node server cluster where 2 nodes do nothing but server out nfs. Since these two are apart of the server cluster...client clusters wanting to mount the server cluster via gpfs need to see them. I think both OPA fabfics need to be on all 4 of your server nodes. Eric On Thu, Jul 19, 2018 at 10:05 AM, Peinkofer, Stephan < Stephan.Peinkofer at lrz.de> wrote: > Dear GPFS List, > > does anyone of you know, if it is possible to have multiple file systems > in a GPFS Cluster that all are served primary via Ethernet but for which > different ?booster? connections to various IB/OPA fabrics exist. > > For example let?s say in my central Storage/NSD Cluster, I implement two > file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is > served by NSD-C and NSD-D. > Now I have two client Clusters C1 and C2 which have different OPA fabrics. > Both Clusters can mount the two file systems via Ethernet, but I now add > OPA connections for NSD-A and NSD-B to C1?s fabric and OPA connections for > NSD-C and NSD-D to C2?s fabric and just switch on RDMA. > As far as I understood, GPFS will use RDMA if it is available between two > nodes but switch to Ethernet if RDMA is not available between the two > nodes. So given just this, the above scenario could work in principle. But > will it work in reality and will it be supported by IBM? > > Many thanks in advance. > Best Regards, > Stephan Peinkofer > -- > Stephan Peinkofer > Leibniz Supercomputing Centre > Data and Storage Division > Boltzmannstra?e 1, 85748 Garching b. M?nchen > URL: http://www.lrz.de > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 19 16:42:48 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 19 Jul 2018 15:42:48 +0000 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster Message-ID: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> I think what you want is to use fabric numbers with verbsPorts, e.g. we have two IB fabrics and in the config we do thinks like: [nodeclass1] verbsPorts mlx4_0/1/1 [nodeclass2] verbsPorts mlx5_0/1/3 GPFS recognises the /1 or /3 at the end as a fabric number and knows they are separate and will Ethernet between those nodes instead. Simon From: on behalf of "Stephan.Peinkofer at lrz.de" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 19 July 2018 at 15:13 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster Dear GPFS List, does anyone of you know, if it is possible to have multiple file systems in a GPFS Cluster that all are served primary via Ethernet but for which different ?booster? connections to various IB/OPA fabrics exist. For example let?s say in my central Storage/NSD Cluster, I implement two file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is served by NSD-C and NSD-D. Now I have two client Clusters C1 and C2 which have different OPA fabrics. Both Clusters can mount the two file systems via Ethernet, but I now add OPA connections for NSD-A and NSD-B to C1?s fabric and OPA connections for NSD-C and NSD-D to C2?s fabric and just switch on RDMA. As far as I understood, GPFS will use RDMA if it is available between two nodes but switch to Ethernet if RDMA is not available between the two nodes. So given just this, the above scenario could work in principle. But will it work in reality and will it be supported by IBM? Many thanks in advance. Best Regards, Stephan Peinkofer -- Stephan Peinkofer Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen URL: http://www.lrz.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Jul 19 17:54:22 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 19 Jul 2018 12:54:22 -0400 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster In-Reply-To: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> References: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> Message-ID: To add to the excellent advice others have already provided, I think you have fundamentally 2 choices: - Establish additional OPA connections from NSD-A and NSD-B to cluster C2 and from NSD-C and NSD-D to cluster C1 *or* - Add NSD-A and NSD-B as nsd servers for the NSDs for FS2 and add NSD-C and NSD-D as nsd servers for the NSDs for FS1. (Note: If you're running Scale 5.0 you can change the NSD server list with the FS available and mounted, else you'll need an outage to unmount the FS and change the NSD server list.) It's a matter of what's preferable (aasier, cheaper, etc.)-- adding OPA connections to the NSD servers or adding additional LUN presentations (which may involve SAN connections, of course) to the NSD servers. In our environment we do the latter and it works very well for us. -Aaron On 7/19/18 11:42 AM, Simon Thompson wrote: > I think what you want is to use fabric numbers with verbsPorts, e.g. we > have two IB fabrics and in the config we do thinks like: > > [nodeclass1] > > verbsPorts mlx4_0/1/1 > > [nodeclass2] > > verbsPorts mlx5_0/1/3 > > GPFS recognises the /1 or /3 at the end as a fabric number and knows > they are separate and will Ethernet between those nodes instead. > > Simon > > *From: * on behalf of > "Stephan.Peinkofer at lrz.de" > *Reply-To: *"gpfsug-discuss at spectrumscale.org" > > *Date: *Thursday, 19 July 2018 at 15:13 > *To: *"gpfsug-discuss at spectrumscale.org" > *Subject: *[gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD > Cluster > > Dear GPFS List, > > does anyone of you know, if it is possible to have multiple file systems > in a GPFS Cluster that all are served primary via Ethernet but for which > different ?booster? connections to various IB/OPA fabrics exist. > > For example let?s say in my central Storage/NSD Cluster, I implement two > file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is > served by NSD-C and NSD-D. > > Now I have two client Clusters C1 and C2 which have different OPA > fabrics. Both Clusters can mount the two file systems via Ethernet, but > I now add OPA connections for NSD-A and NSD-B to C1?s fabric and OPA > connections for NSD-C and NSD-D to ?C2?s fabric and just switch on RDMA. > > As far as I understood, GPFS will use RDMA if it is available between > two nodes but switch to Ethernet if RDMA is not available between the > two nodes. So given just this, the above scenario could work in > principle. But will it work in reality and will it be supported by IBM? > > Many thanks in advance. > > Best Regards, > > Stephan Peinkofer > > -- > Stephan Peinkofer > Leibniz Supercomputing Centre > Data and Storage Division > Boltzmannstra?e 1, 85748 Garching b.?M?nchen > URL: http://www.lrz.de > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From valdis.kletnieks at vt.edu Thu Jul 19 22:25:23 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 19 Jul 2018 17:25:23 -0400 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? Message-ID: <25435.1532035523@turing-police.cc.vt.edu> So I'm trying to tidy up things like 'mmhealth' etc. Got most of it fixed, but stuck on one thing.. Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which cleaned out a bunch of other long-past events that were "stuck" as failed / degraded even though they were corrected days/weeks ago - keep this in mind as you read on.... # mmhealth cluster show Component Total Failed Degraded Healthy Other ------------------------------------------------------------------------------------- NODE 10 0 0 10 0 GPFS 10 0 0 10 0 NETWORK 10 0 0 10 0 FILESYSTEM 1 0 1 0 0 DISK 102 0 0 102 0 CES 4 0 0 4 0 GUI 1 0 0 1 0 PERFMON 10 0 0 10 0 THRESHOLD 10 0 0 10 0 Great. One hit for 'degraded' filesystem. # mmhealth node show --unhealthy -N all (skipping all the nodes that show healthy) Node name: arnsd3-vtc.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ----------------------------------------------------------------------------------- FILESYSTEM FAILED 24 days ago pool-data_high_error(archive/system) (...) Node name: arproto2-isb.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ---------------------------------------------------------------------------------- FILESYSTEM DEGRADED 6 days ago pool-data_high_warn(archive/system) mmdf tells me: nsd_isb_01 13103005696 1 No Yes 1747905536 ( 13%) 111667200 ( 1%) nsd_isb_02 13103005696 1 No Yes 1748245504 ( 13%) 111724384 ( 1%) (94 more LUNs all within 0.2% of these for usage - data is striped out pretty well) There's also 6 SSD LUNs for metadata: nsd_isb_flash_01 2956984320 1 Yes No 2116091904 ( 72%) 26996992 ( 1%) (again, evenly striped) So who is remembering that status, and how to clear it? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jul 19 23:23:06 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 19 Jul 2018 22:23:06 +0000 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? In-Reply-To: <25435.1532035523@turing-police.cc.vt.edu> References: <25435.1532035523@turing-police.cc.vt.edu> Message-ID: <2165FB72-BF80-4EE4-908F-0399620C83D6@vanderbilt.edu> Hi Valdis, Is this what you?re looking for (from an IBMer in response to another question a few weeks back)? assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings: # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 # mmhealth thresholds delete MetaDataCapUtil_Rule The rule(s) was(were) deleted successfully # mmhealth thresholds add MetaDataPool_capUtil --errorlevel 95.0 --warnlevel 85.0 --direction high --sensitivity 300 --name MetaDataCapUtil_Rule --groupby gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name # mmhealth thresholds list ### Threshold Rules ### rule_name metric error warn direction filterBy groupBy sensitivity -------------------------------------------------------------------------------------------------------------------------------------------------------- InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300 MemFree_Rule mem_memfree 50000 100000 low node 300 DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 MetaDataCapUtil_Rule MetaDataPool_capUtil 95.0 85.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300 Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 19, 2018, at 4:25 PM, valdis.kletnieks at vt.edu wrote: So I'm trying to tidy up things like 'mmhealth' etc. Got most of it fixed, but stuck on one thing.. Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which cleaned out a bunch of other long-past events that were "stuck" as failed / degraded even though they were corrected days/weeks ago - keep this in mind as you read on.... # mmhealth cluster show Component Total Failed Degraded Healthy Other ------------------------------------------------------------------------------------- NODE 10 0 0 10 0 GPFS 10 0 0 10 0 NETWORK 10 0 0 10 0 FILESYSTEM 1 0 1 0 0 DISK 102 0 0 102 0 CES 4 0 0 4 0 GUI 1 0 0 1 0 PERFMON 10 0 0 10 0 THRESHOLD 10 0 0 10 0 Great. One hit for 'degraded' filesystem. # mmhealth node show --unhealthy -N all (skipping all the nodes that show healthy) Node name: arnsd3-vtc.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ----------------------------------------------------------------------------------- FILESYSTEM FAILED 24 days ago pool-data_high_error(archive/system) (...) Node name: arproto2-isb.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ---------------------------------------------------------------------------------- FILESYSTEM DEGRADED 6 days ago pool-data_high_warn(archive/system) mmdf tells me: nsd_isb_01 13103005696 1 No Yes 1747905536 ( 13%) 111667200 ( 1%) nsd_isb_02 13103005696 1 No Yes 1748245504 ( 13%) 111724384 ( 1%) (94 more LUNs all within 0.2% of these for usage - data is striped out pretty well) There's also 6 SSD LUNs for metadata: nsd_isb_flash_01 2956984320 1 Yes No 2116091904 ( 72%) 26996992 ( 1%) (again, evenly striped) So who is remembering that status, and how to clear it? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ca2e808fa12e74ed277bc08d5edc51bc3%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636676353194563950&sdata=5biJuM0K0XwEw3BMwbS5epNQhrlig%2FFON7k1V79G%2Fyc%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Peinkofer at lrz.de Fri Jul 20 07:39:24 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 20 Jul 2018 06:39:24 +0000 Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster In-Reply-To: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> References: <673207F9-A74C-44BB-A37E-12BCD7B1FF4D@bham.ac.uk> Message-ID: <05cf5689138043da8321b728f320834c@lrz.de> Dear Simon and List, thanks. That was exactly I was looking for. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: Thursday, July 19, 2018 5:42 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster I think what you want is to use fabric numbers with verbsPorts, e.g. we have two IB fabrics and in the config we do thinks like: [nodeclass1] verbsPorts mlx4_0/1/1 [nodeclass2] verbsPorts mlx5_0/1/3 GPFS recognises the /1 or /3 at the end as a fabric number and knows they are separate and will Ethernet between those nodes instead. Simon From: on behalf of "Stephan.Peinkofer at lrz.de" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 19 July 2018 at 15:13 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Mixing RDMA Client Fabrics for a single NSD Cluster Dear GPFS List, does anyone of you know, if it is possible to have multiple file systems in a GPFS Cluster that all are served primary via Ethernet but for which different ?booster? connections to various IB/OPA fabrics exist. For example let?s say in my central Storage/NSD Cluster, I implement two file systems FS1 and FS2. FS1 is served by NSD-A and NSD-B and FS2 is served by NSD-C and NSD-D. Now I have two client Clusters C1 and C2 which have different OPA fabrics. Both Clusters can mount the two file systems via Ethernet, but I now add OPA connections for NSD-A and NSD-B to C1?s fabric and OPA connections for NSD-C and NSD-D to C2?s fabric and just switch on RDMA. As far as I understood, GPFS will use RDMA if it is available between two nodes but switch to Ethernet if RDMA is not available between the two nodes. So given just this, the above scenario could work in principle. But will it work in reality and will it be supported by IBM? Many thanks in advance. Best Regards, Stephan Peinkofer -- Stephan Peinkofer Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen URL: http://www.lrz.de LRZ: Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften www.lrz.de Das LRZ ist das Rechenzentrum f?r die M?nchner Universit?ten, die Bayerische Akademie der Wissenschaften sowie nationales Zentrum f?r Hochleistungsrechnen. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Jul 20 09:29:29 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 20 Jul 2018 16:29:29 +0800 Subject: [gpfsug-discuss] mmfsadddisk command interrupted In-Reply-To: References: Message-ID: Hi Damir, Since many GPFS management command got unresponsive and you are running ESS, mail-list maybe not a good way to track this kinds of issue. Could you please raise a ticket to ESS/SpectrumScale to get help from IBM Service team? Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Damir Krstic To: gpfsug main discussion list Date: 06/23/2018 03:04 AM Subject: [gpfsug-discuss] mmfsadddisk command interrupted Sent by: gpfsug-discuss-bounces at spectrumscale.org We were adding disks to one of our larger filesystems today. During the "checking allocation map for storage pool system" we had to interrupt the command since it was causing slow downs on our filesystem. Now commands like mmrepquota, mmdf, etc. are timing out with tsaddisk command is running message. Also during the run of the mmdf, mmrepquota, etc. filesystem becomes completely unresponsive. This command was run on ESS running version 5.2.0. Any help is much appreciated. Thank you. Damir_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From YARD at il.ibm.com Sat Jul 21 21:22:47 2018 From: YARD at il.ibm.com (Yaron Daniel) Date: Sat, 21 Jul 2018 23:22:47 +0300 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: Hi Do u run mmbackup on snapshot , which is read only ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 11294 bytes Desc: not available URL: From p.childs at qmul.ac.uk Sun Jul 22 12:26:35 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Sun, 22 Jul 2018 11:26:35 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk>, Message-ID: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_0C9372140C936C60006FF189C22582D1] Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_0C9306EC0C92FECC006FF189C22582D1][cid:_1_0C9308F40C92FECC006FF189C22582D1] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.gif Type: image/gif Size: 1851 bytes Desc: ATT00001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00002.gif Type: image/gif Size: 4376 bytes Desc: ATT00002.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00003.gif Type: image/gif Size: 5093 bytes Desc: ATT00003.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00004.gif Type: image/gif Size: 4746 bytes Desc: ATT00004.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00005.gif Type: image/gif Size: 4557 bytes Desc: ATT00005.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00006.gif Type: image/gif Size: 5093 bytes Desc: ATT00006.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00007.jpg Type: image/jpeg Size: 11294 bytes Desc: ATT00007.jpg URL: From jose.filipe.higino at gmail.com Sun Jul 22 13:51:03 2018 From: jose.filipe.higino at gmail.com (=?UTF-8?Q?Jos=C3=A9_Filipe_Higino?=) Date: Mon, 23 Jul 2018 00:51:03 +1200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs wrote: > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many > hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs > accessing the same file from many nodes also look to have come to an end > for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to > prioritise certain nodes over others but had not worked out if that was > possible yet. > > We've only previously seen this problem when we had some bad disks in our > storage, which we replaced, I've checked and I can't see that issue > currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Storage Architect ? IL Lab Services (Storage)* Petach Tiqva, 49527 > *IBM Global Markets, Systems HW Sales* Israel > > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > Mobile: +972-52-8395593 > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > [image: IBM Storage Strategy and Solutions v1][image: IBM Storage > Management and Data Protection v1] [image: > https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] > [image: Related image] > > > > From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / > processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00002.gif Type: image/gif Size: 4376 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00003.gif Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00004.gif Type: image/gif Size: 4746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00005.gif Type: image/gif Size: 4557 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00006.gif Type: image/gif Size: 5093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00007.jpg Type: image/jpeg Size: 11294 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00002.gif Type: image/gif Size: 4376 bytes Desc: not available URL: From scale at us.ibm.com Mon Jul 23 04:06:33 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 23 Jul 2018 11:06:33 +0800 Subject: [gpfsug-discuss] -o syncnfs has no effect? In-Reply-To: References: Message-ID: Hi, mmchfs Device -o syncnfs is the correct way of setting the syncnfs so that it applies to the file system both on the home and the remote cluster On 4.2.3+ syncnfs is the default option on Linux . Which means GPFS will implement the syncnfs behavior regardless of what the mount command says The documentation indicates that mmmount Device -o syncnfs=yes appears to be the correct syntax. When I tried that, I do see 'syncnfs=yes' in the output of the 'mount' command To change the remote mount option so that you don't have to specify the option on the command line every time you do mmmount, instead of using mmchfs, one should use mmremotefs update -o. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Billich Heinrich Rainer (PSI)" To: gpfsug main discussion list Date: 07/06/2018 12:06 AM Subject: [gpfsug-discuss] -o syncnfs has no effect? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, I try to mount a fs with "-o syncnfs" as we'll export it with CES/Protocols. But I never see the mount option displayed when I do # mount | grep fs-name This is a remote cluster mount, we'll run the Protocol nodes in a separate cluster. On the home cluster I see the option 'nfssync' in the output of 'mount'. My conclusion is that the mount option "syncnfs" has no effect on remote cluster mounts. Which seems a bit strange? Please can someone clarify on this? What is the impact on protocol nodes exporting remote cluster mounts? Is there any chance of data corruption? Or are some mount options implicitely inherited from the home cluster? I've read 'syncnfs' is default on Linux, but I would like to know for sure. Funny enough I can pass arbitrary options with # mmmount -o some-garbage which are silently ignored. I did 'mmchfs -o syncnfs' on the home cluster and the syncnfs option is present in /etc/fstab on the remote cluster. I did not remount on all nodes __ Thank you, I'll appreciate any hints or replies. Heiner Versions: Remote cluster 5.0.1 on RHEL7.4 (imounts the fs and runs protocol nodes) Home cluster 4.2.3-8 on RHEL6 (export the fs, owns the storage) Filesystem: 17.00 (4.2.3.0) All Linux x86_64 with Spectrum Scale Standard Edition -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scale at us.ibm.com Mon Jul 23 07:51:54 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 23 Jul 2018 14:51:54 +0800 Subject: [gpfsug-discuss] mmdiag --iohist question In-Reply-To: <351F676D-D785-4895-A278-3BEA717B9C87@vanderbilt.edu> References: <351F676D-D785-4895-A278-3BEA717B9C87@vanderbilt.edu> Message-ID: Hi Please check the IO type before examining the IP address for the output of mmdiag --iohist. For the "lcl"(local) IO, the IP address is not necessary and we don't show it. Please check whether this is your case. === mmdiag: iohist === I/O history: I/O start time RW Buf type disk:sectorNum nSec time ms Type Device/NSD ID NSD node --------------- -- ----------- ----------------- ----- ------- ---- ------------------ --------------- 01:14:08.450177 R inode 6:189513568 8 4.920 srv dm-4 192.168.116.92 01:14:08.450448 R inode 6:189513664 8 4.968 srv dm-4 192.168.116.92 01:14:08.475689 R inode 6:189428264 8 0.230 srv dm-4 192.168.116.92 01:14:08.983587 W logData 4:30686784 8 0.216 lcl dm-0 01:14:08.983601 W logData 3:25468480 8 0.197 lcl dm-8 01:14:08.983961 W inode 2:188808504 8 0.142 lcl dm-11 01:14:08.984144 W inode 1:188808504 8 0.134 lcl dm-7 Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/11/2018 10:34 PM Subject: [gpfsug-discuss] mmdiag --iohist question Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Quick question about ?mmdiag ?iohist? that is not documented in the man page ? what does it mean if the client IP address field is blank? That the NSD server itself issued the I/O? Or ??? This only happens occasionally ? and the way I discovered it was that our Python script that takes ?mmdiag ?iohist? output, looks up the client IP for any waits above the threshold, converts that to a hostname, and queries SLURM for whose jobs are on that client started occasionally throwing an exception ? and when I started looking at the ?mmdiag ?iohist? output itself I do see times when there is no client IP address listed for a I/O wait. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From p.childs at qmul.ac.uk Mon Jul 23 09:37:41 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 23 Jul 2018 08:37:41 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> Message-ID: <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs > wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_0C9372140C936C60006FF189C22582D1] Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_0C9306EC0C92FECC006FF189C22582D1][cid:_1_0C9308F40C92FECC006FF189C22582D1] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" > Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From jose.filipe.higino at gmail.com Mon Jul 23 11:13:56 2018 From: jose.filipe.higino at gmail.com (=?UTF-8?Q?Jos=C3=A9_Filipe_Higino?=) Date: Mon, 23 Jul 2018 22:13:56 +1200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> Message-ID: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? How many quorum nodes? How many filesystems? Is the management network the same as the daemon network? On Mon, 23 Jul 2018 at 20:37, Peter Childs wrote: > On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: > > > Hi there, > > Have you been able to create a test case (replicate the problem)? Can you > tell us a bit more about the setup? > > > Not really, It feels like a perfect storm, any one of the tasks running on > its own would be fine, Its the shear load, our mmpmon data says the storage > has been flat lining when it occurs. > > Its a reasonably standard (small) HPC cluster, with a very mixed work > load, hence while we can usually find "bad" jobs from the point of view of > io on this occasion we can see a few large array jobs all accessing the > same file, the cluster runs fine until we get to a certain point and one > more will tip the balance. We've been attempting to limit the problem by > adding limits to the number of jobs in an array that can run at once. But > that feels like fire fighting. > > > Are you using GPFS API over any administrative commands? Any problems with > the network (being that Ethernet or IB)? > > > We're not as using the GPFS API, never got it working, which is a shame, > I've never managed to figure out the setup, although it is on my to do list. > > Network wise, We've just removed a great deal of noise from arp requests > by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit > network currently, we're currently looking at removing all the 1GBit nodes > within the next few months and adding some new faster kit. The Storage is > attached at 40GBit but it does not look to want to run much above 5Gbit I > suspect due to Ethernet back off due to the mixed speeds. > > While we do have some IB we don't currently run our storage over it. > > Thanks in advance > > Peter Childs > > > > > > Sorry if I am un-announced here for the first time. But I would like to > help if I can. > > Jose Higino, > from NIWA > New Zealand > > Cheers > > On Sun, 22 Jul 2018 at 23:26, Peter Childs wrote: > > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many > hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs > accessing the same file from many nodes also look to have come to an end > for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to > prioritise certain nodes over others but had not worked out if that was > possible yet. > > We've only previously seen this problem when we had some bad disks in our > storage, which we replaced, I've checked and I can't see that issue > currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Storage Architect ? IL Lab Services (Storage)* Petach Tiqva, 49527 > *IBM Global Markets, Systems HW Sales* Israel > > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > Mobile: +972-52-8395593 > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > [image: IBM Storage Strategy and Solutions v1][image: IBM Storage > Management and Data Protection v1] [image: > https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] > [image: Related image] > > > > From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / > processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jul 23 12:06:20 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 23 Jul 2018 11:06:20 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> Message-ID: On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? 4.2.3-8 How many quorum nodes? 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. How many filesystems? 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) Is the management network the same as the daemon network? Yes. the management network and the daemon network are the same network. Thanks in advance Peter Childs On Mon, 23 Jul 2018 at 20:37, Peter Childs > wrote: On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs > wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [X] Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][X][X] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" > Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From jose.filipe.higino at gmail.com Mon Jul 23 12:59:22 2018 From: jose.filipe.higino at gmail.com (=?UTF-8?Q?Jos=C3=A9_Filipe_Higino?=) Date: Mon, 23 Jul 2018 23:59:22 +1200 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk> <51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> Message-ID: Are the tiebreaker disks part of the same storage that is being used to provide disks for the NSDs of your filesystem? Having both management and daemon networks on the same network can impact the cluster in many ways. Depending on the requirements and workload conditions to run the cluster. Especially if the network is not 100% top notch or can be affected by external factors (other types of utilization). I would recur to a recent (and/or run a new one) performance benchmark result (IOR and MDTEST) and try to understand if the recordings of the current performance while observing the problem really tell something new. If not (if benchmarks tell that you are at the edge of the performance, then the best would be to consider increasing cluster performance) with additional disk hardware and/or network performance. If possible I would also recommend upgrading to the new Spectrum Scale 5 that have many new performance features. On Mon, 23 Jul 2018 at 23:06, Peter Childs wrote: > On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: > > I think the network problems need to be cleared first. Then I would > investigate further. > > Buf if that is not a trivial path... > Are you able to understand from the mmfslog what happens when the tipping > point occurs? > > > mmfslog thats not a term I've come accross before, if you mean > /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In > other words no expulsions or errors just a very slow filesystem, We've not > seen any significantly long waiters either (mmdiag --waiters) so as far as > I can see its just behaving like a very very busy filesystem. > > We've already had IBM looking at the snaps due to the rather slow mmbackup > process, all I've had back is to try increase -a ie the number of sort > threads which has speed it up to a certain extent, But once again I think > we're looking at the results of the issue not the cause. > > > In my view, when troubleshooting is not easy, the usual methods work/help > to find the next step: > - Narrow the window of troubleshooting (by discarding "for now" events > that did not happen within the same timeframe) > - Use "as precise" as possible, timebased events to read the reaction of > the cluster (via log or others) and make assumptions about other observed > situations. > - If possible and when the problem is happening, run some traces, > gpfs.snap and ask for support via PMR. > > Also, > > What is version of GPFS? > > > 4.2.3-8 > > How many quorum nodes? > > > 4 Quorum nodes with tie breaker disks, however these are not the file > system manager nodes as to fix a previous problem (with our nsd servers not > being powerful enough) our fsmanager nodes are on hardware, We have two > file system manager nodes (Which do token management, quota management etc) > they also run the mmbackup. > > How many filesystems? > > > 1, although we do have a second that is accessed via multi-cluster from > our older GPFS setup, (thats running 4.2.3-6 currently) > > Is the management network the same as the daemon network? > > > Yes. the management network and the daemon network are the same network. > > Thanks in advance > > Peter Childs > > > > On Mon, 23 Jul 2018 at 20:37, Peter Childs wrote: > > On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: > > > Hi there, > > Have you been able to create a test case (replicate the problem)? Can you > tell us a bit more about the setup? > > > Not really, It feels like a perfect storm, any one of the tasks running on > its own would be fine, Its the shear load, our mmpmon data says the storage > has been flat lining when it occurs. > > Its a reasonably standard (small) HPC cluster, with a very mixed work > load, hence while we can usually find "bad" jobs from the point of view of > io on this occasion we can see a few large array jobs all accessing the > same file, the cluster runs fine until we get to a certain point and one > more will tip the balance. We've been attempting to limit the problem by > adding limits to the number of jobs in an array that can run at once. But > that feels like fire fighting. > > > Are you using GPFS API over any administrative commands? Any problems with > the network (being that Ethernet or IB)? > > > We're not as using the GPFS API, never got it working, which is a shame, > I've never managed to figure out the setup, although it is on my to do list. > > Network wise, We've just removed a great deal of noise from arp requests > by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit > network currently, we're currently looking at removing all the 1GBit nodes > within the next few months and adding some new faster kit. The Storage is > attached at 40GBit but it does not look to want to run much above 5Gbit I > suspect due to Ethernet back off due to the mixed speeds. > > While we do have some IB we don't currently run our storage over it. > > Thanks in advance > > Peter Childs > > > > > > Sorry if I am un-announced here for the first time. But I would like to > help if I can. > > Jose Higino, > from NIWA > New Zealand > > Cheers > > On Sun, 22 Jul 2018 at 23:26, Peter Childs wrote: > > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many > hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs > accessing the same file from many nodes also look to have come to an end > for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to > prioritise certain nodes over others but had not worked out if that was > possible yet. > > We've only previously seen this problem when we had some bad disks in our > storage, which we replaced, I've checked and I can't see that issue > currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Storage Architect ? IL Lab Services (Storage)* Petach Tiqva, 49527 > *IBM Global Markets, Systems HW Sales* Israel > > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > Mobile: +972-52-8395593 > e-mail: yard at il.ibm.com > *IBM Israel* > > > > > [image: IBM Storage Strategy and Solutions v1][image: IBM Storage > Management and Data Protection v1] [image: > https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] > [image: Related image] > > > > From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / > processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Jul 23 13:06:22 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 23 Jul 2018 08:06:22 -0400 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk><51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> Message-ID: Have you considered keeping the 1G network for daemon traffic and moving the data traffic to another network? Given the description of your configuration with only 2 manager nodes handling mmbackup and other tasks my guess is that is where the problem lies regarding performance when mmbackup is running with the many nodes accessing a single file. You said the fs managers were on hardware, does that mean other nodes in this cluster are VMs of some kind? You stated that your NSD servers were under powered. Did you address that problem in any way, that is adding memory/CPUs, or did you just move other GPFS activity off of those nodes? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/23/2018 07:06 AM Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? 4.2.3-8 How many quorum nodes? 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. How many filesystems? 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) Is the management network the same as the daemon network? Yes. the management network and the daemon network are the same network. Thanks in advance Peter Childs On Mon, 23 Jul 2018 at 20:37, Peter Childs wrote: On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Mon Jul 23 19:12:25 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 23 Jul 2018 14:12:25 -0400 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? In-Reply-To: <2165FB72-BF80-4EE4-908F-0399620C83D6@vanderbilt.edu> References: <25435.1532035523@turing-police.cc.vt.edu> <2165FB72-BF80-4EE4-908F-0399620C83D6@vanderbilt.edu> Message-ID: <22017.1532369545@turing-police.cc.vt.edu> On Thu, 19 Jul 2018 22:23:06 -0000, "Buterbaugh, Kevin L" said: > Is this what you???re looking for (from an IBMer in response to another question a few weeks back)? > > assuming 4.2.3 code level this can be done by deleting and recreating the rule with changed settings: Nope, that bring zero joy (though it did give me a chance to set a more appropriate set of thresholds for our environment. And I'm still perplexed as to *where* those events are stored - what's remembering it after a 'mmhealth eventlog --clear -N all'? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jul 23 21:05:05 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 23 Jul 2018 20:05:05 +0000 Subject: [gpfsug-discuss] mmdiag --iohist question In-Reply-To: References: <351F676D-D785-4895-A278-3BEA717B9C87@vanderbilt.edu> Message-ID: Hi GPFS team, Yes, that?s what we see, too ? thanks. Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 23, 2018, at 1:51 AM, IBM Spectrum Scale > wrote: Hi Please check the IO type before examining the IP address for the output of mmdiag --iohist. For the "lcl"(local) IO, the IP address is not necessary and we don't show it. Please check whether this is your case. === mmdiag: iohist === I/O history: I/O start time RW Buf type disk:sectorNum nSec time ms Type Device/NSD ID NSD node --------------- -- ----------- ----------------- ----- ------- ---- ------------------ --------------- 01:14:08.450177 R inode 6:189513568 8 4.920 srv dm-4 192.168.116.92 01:14:08.450448 R inode 6:189513664 8 4.968 srv dm-4 192.168.116.92 01:14:08.475689 R inode 6:189428264 8 0.230 srv dm-4 192.168.116.92 01:14:08.983587 W logData 4:30686784 8 0.216 lcl dm-0 01:14:08.983601 W logData 3:25468480 8 0.197 lcl dm-8 01:14:08.983961 W inode 2:188808504 8 0.142 lcl dm-11 01:14:08.984144 W inode 1:188808504 8 0.134 lcl dm-7 Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Buterbaugh, Kevin L" ---07/11/2018 10:34:32 PM---Hi All, Quick question about ?mmdiag ?iohist? that is not documented in the man page ? what does it From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 07/11/2018 10:34 PM Subject: [gpfsug-discuss] mmdiag --iohist question Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Quick question about ?mmdiag ?iohist? that is not documented in the man page ? what does it mean if the client IP address field is blank? That the NSD server itself issued the I/O? Or ??? This only happens occasionally ? and the way I discovered it was that our Python script that takes ?mmdiag ?iohist? output, looks up the client IP for any waits above the threshold, converts that to a hostname, and queries SLURM for whose jobs are on that client started occasionally throwing an exception ? and when I started looking at the ?mmdiag ?iohist? output itself I do see times when there is no client IP address listed for a I/O wait. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbc6d7df8b9fb453b50bf08d5f068cc1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636679255264001433&sdata=uSiXYheeOw%2F4%2BSls8lP3XO9w7i7dFc3UWEYa%2F8aIn%2B0%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Mon Jul 23 21:06:14 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Mon, 23 Jul 2018 20:06:14 +0000 Subject: [gpfsug-discuss] Same file opened by many nodes / processes In-Reply-To: References: <4e038c492713f418242be208532e112f8ea50a9f.camel@qmul.ac.uk><51f1f0984a7801d65043bd9fd2643bb3d641f6b0.camel@qmul.ac.uk> , Message-ID: ---- Frederick Stock wrote ---- > Have you considered keeping the 1G network for daemon traffic and moving the data traffic to another network? Considered, but never really understood the logic or value of building a second network, nor seen a good argument for the additional cost and work setting it up. While I've heard it lots of times, that the network is key to good gpfs performance. I've actually always found that it can be lots of other things too and your usally best keeping and open view and checking everything. This issue disappeared on Friday when the file system manager locked up entirely, and we failed it over to the other one and restarted gpfs. It's been fine all weekend, and currently it's looking to be a failed gpfs daemon on the manager node that was causing all the bad io. If I'd know that I'd have restarted gpfs on that node earlier... > > Given the description of your configuration with only 2 manager nodes handling mmbackup and other tasks my guess is that is where the problem lies regarding performance when mmbackup is running with the many nodes accessing a single file. You said the fs managers were on hardware, does that mean other nodes in this cluster are VMs of some kind? > > You stated that your NSD servers were under powered. Did you address that problem in any way, that is adding memory/CPUs, or did you just move other GPFS activity off of those nodes? > Our nsd servers are virtual everything else on the cluster is real. It's a gridscaler gs7k. Hence why it's difficult to throw more power at the issue. We are looking at upgrading to 5.0.1, within the next few months as we're in the progress of adding a new ssd based scratch filesystem to the cluster. Hopefully this will help resolve some of our issues. Peter Childs. > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: Peter Childs > > To: "gpfsug-discuss at spectrumscale.org" > > Date: 07/23/2018 07:06 AM > Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: > I think the network problems need to be cleared first. Then I would investigate further. > > Buf if that is not a trivial path... > Are you able to understand from the mmfslog what happens when the tipping point occurs? > > mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. > > We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. > > > In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: > - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) > - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. > - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. > > Also, > > What is version of GPFS? > > 4.2.3-8 > > How many quorum nodes? > > 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. > > How many filesystems? > > 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) > > Is the management network the same as the daemon network? > > Yes. the management network and the daemon network are the same network. > > Thanks in advance > > Peter Childs > > > > On Mon, 23 Jul 2018 at 20:37, Peter Childs > wrote: > On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: > > Hi there, > > Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? > > Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. > > Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. > > > Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? > > We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. > > Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. > > While we do have some IB we don't currently run our storage over it. > > Thanks in advance > > Peter Childs > > > > > > Sorry if I am un-announced here for the first time. But I would like to help if I can. > > Jose Higino, > from NIWA > New Zealand > > Cheers > > On Sun, 22 Jul 2018 at 23:26, Peter Childs > wrote: > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. > > We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > > > Yaron Daniel 94 Em Ha'Moshavot Rd > > Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel > > > > > > > From: Peter Childs > > To: "gpfsug-discuss at spectrumscale.org" > > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / processes > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss Have you considered keeping the 1G network for daemon traffic and moving the data traffic to another network? Given the description of your configuration with only 2 manager nodes handling mmbackup and other tasks my guess is that is where the problem lies regarding performance when mmbackup is running with the many nodes accessing a single file. You said the fs managers were on hardware, does that mean other nodes in this cluster are VMs of some kind? You stated that your NSD servers were under powered. Did you address that problem in any way, that is adding memory/CPUs, or did you just move other GPFS activity off of those nodes? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: Peter Childs To: "gpfsug-discuss at spectrumscale.org" Date: 07/23/2018 07:06 AM Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ On Mon, 2018-07-23 at 22:13 +1200, Jos? Filipe Higino wrote: I think the network problems need to be cleared first. Then I would investigate further. Buf if that is not a trivial path... Are you able to understand from the mmfslog what happens when the tipping point occurs? mmfslog thats not a term I've come accross before, if you mean /var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In other words no expulsions or errors just a very slow filesystem, We've not seen any significantly long waiters either (mmdiag --waiters) so as far as I can see its just behaving like a very very busy filesystem. We've already had IBM looking at the snaps due to the rather slow mmbackup process, all I've had back is to try increase -a ie the number of sort threads which has speed it up to a certain extent, But once again I think we're looking at the results of the issue not the cause. In my view, when troubleshooting is not easy, the usual methods work/help to find the next step: - Narrow the window of troubleshooting (by discarding "for now" events that did not happen within the same timeframe) - Use "as precise" as possible, timebased events to read the reaction of the cluster (via log or others) and make assumptions about other observed situations. - If possible and when the problem is happening, run some traces, gpfs.snap and ask for support via PMR. Also, What is version of GPFS? 4.2.3-8 How many quorum nodes? 4 Quorum nodes with tie breaker disks, however these are not the file system manager nodes as to fix a previous problem (with our nsd servers not being powerful enough) our fsmanager nodes are on hardware, We have two file system manager nodes (Which do token management, quota management etc) they also run the mmbackup. How many filesystems? 1, although we do have a second that is accessed via multi-cluster from our older GPFS setup, (thats running 4.2.3-6 currently) Is the management network the same as the daemon network? Yes. the management network and the daemon network are the same network. Thanks in advance Peter Childs On Mon, 23 Jul 2018 at 20:37, Peter Childs > wrote: On Mon, 2018-07-23 at 00:51 +1200, Jos? Filipe Higino wrote: Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup? Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs. Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting. Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list. Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds. While we do have some IB we don't currently run our storage over it. Thanks in advance Peter Childs Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs > wrote: Yes, we run mmbackup, using a snapshot. The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday) It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being. I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet. We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently. Thanks for the help. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Yaron Daniel wrote ---- Hi Do u run mmbackup on snapshot , which is read only ? Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: Peter Childs > To: "gpfsug-discuss at spectrumscale.org" > Date: 07/10/2018 05:51 PM Subject: [gpfsug-discuss] Same file opened by many nodes / processes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We have an situation where the same file is being read by around 5000 "jobs" this is an array job in uge with a tc set, so the file in question is being opened by about 100 processes/jobs at the same time. Its a ~200GB file so copying the file locally first is not an easy answer, and these jobs are causing issues with mmbackup scanning the file system, in that the scan is taking 3 hours instead of the normal 40-60 minutes. This is read only access to the file, I don't know the specifics about the job. It looks like the metanode is moving around a fair amount (given what I can see from mmfsadm saferdump file) I'm wondering if we there is anything we can do to improve things or that can be tuned within GPFS, I'm don't think we have an issue with token management, but would increasing maxFileToCache on our token manager node help say? Is there anything else I should look at, to try and attempt to allow GPFS to share this file better. Thanks in advance Peter Childs _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From NSCHULD at de.ibm.com Tue Jul 24 08:45:03 2018 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Tue, 24 Jul 2018 09:45:03 +0200 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? In-Reply-To: <25435.1532035523@turing-police.cc.vt.edu> References: <25435.1532035523@turing-police.cc.vt.edu> Message-ID: Hi, that message is still in memory. "mmhealth node eventlog --clear" deletes all old events but those which are currently active are not affected. I think this is related to multiple Collector Nodes, will dig deeper into that code to find out if some issue lurks there. As a stop-gap measure one could execute "mmsysmoncontrol restart" on the affected node(s) as this stops the monitoring process and doing so clears the event in memory. The data used for the event comes from mmlspool (should be close or identical to mmdf) Mit freundlichen Gr??en / Kind regards Norbert Schuld From: valdis.kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Date: 20/07/2018 00:15 Subject: [gpfsug-discuss] mmhealth - where is the info hiding? Sent by: gpfsug-discuss-bounces at spectrumscale.org So I'm trying to tidy up things like 'mmhealth' etc. Got most of it fixed, but stuck on one thing.. Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which cleaned out a bunch of other long-past events that were "stuck" as failed / degraded even though they were corrected days/weeks ago - keep this in mind as you read on.... # mmhealth cluster show Component Total Failed Degraded Healthy Other ------------------------------------------------------------------------------------- NODE 10 0 0 10 0 GPFS 10 0 0 10 0 NETWORK 10 0 0 10 0 FILESYSTEM 1 0 1 0 0 DISK 102 0 0 102 0 CES 4 0 0 4 0 GUI 1 0 0 1 0 PERFMON 10 0 0 10 0 THRESHOLD 10 0 0 10 0 Great. One hit for 'degraded' filesystem. # mmhealth node show --unhealthy -N all (skipping all the nodes that show healthy) Node name: arnsd3-vtc.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ----------------------------------------------------------------------------------- FILESYSTEM FAILED 24 days ago pool-data_high_error (archive/system) (...) Node name: arproto2-isb.nis.internal Node status: HEALTHY Status Change: 21 hours ago Component Status Status Change Reasons ---------------------------------------------------------------------------------- FILESYSTEM DEGRADED 6 days ago pool-data_high_warn (archive/system) mmdf tells me: nsd_isb_01 13103005696 1 No Yes 1747905536 ( 13%) 111667200 ( 1%) nsd_isb_02 13103005696 1 No Yes 1748245504 ( 13%) 111724384 ( 1%) (94 more LUNs all within 0.2% of these for usage - data is striped out pretty well) There's also 6 SSD LUNs for metadata: nsd_isb_flash_01 2956984320 1 Yes No 2116091904 ( 72%) 26996992 ( 1%) (again, evenly striped) So who is remembering that status, and how to clear it? [attachment "attccdgx.dat" deleted by Norbert Schuld/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From heiner.billich at psi.ch Tue Jul 24 14:43:52 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Tue, 24 Jul 2018 13:43:52 +0000 Subject: [gpfsug-discuss] control which hosts become token manager Message-ID: Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From bzhang at ca.ibm.com Tue Jul 24 16:03:54 2018 From: bzhang at ca.ibm.com (Bohai Zhang) Date: Tue, 24 Jul 2018 11:03:54 -0400 Subject: [gpfsug-discuss] IBM Elastic Storage Server (ESS) Support is going to host a client facing webinar In-Reply-To: References: <25435.1532035523@turing-police.cc.vt.edu> Message-ID: Hi all, IBM Elastic Storage Server support team is going to host a webinar to discuss Spectrum Scale (GPFS) encryption. Everyone is welcome. Please use the following links to register. Thanks, NA/EU Session Date: Aug 8, 2018 Time: 10 AM - 11 AM EDT (2 PM ? 3 PM GMT) Registration: https://ibm.biz/BdY4SE Audience: Scale/ESS administrators. AP/JP/India Session Date: Aug 9, 2018 Time: 10 AM - 11 AM Beijing Time (11 AM ? 12? AM Tokyo Time) Registration: https://ibm.biz/BdY4SH Audience: Scale/ESS administrators. Regards, IBM Spectrum Computing Bohai Zhang Critical Senior Technical Leader, IBM Systems Situation Tel: 1-905-316-2727 Resolver Mobile: 1-416-897-7488 Expert Badge Email: bzhang at ca.ibm.com 3600 STEELES AVE EAST, MARKHAM, ON, L3R 9Z7, Canada Live Chat at IBMStorageSuptMobile Apps Support Portal | Fix Central | Knowledge Center | Request for Enhancement | Product SMC IBM | dWA We meet our service commitment only when you are very satisfied and EXTREMELY LIKELY to recommend IBM. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C553518.jpg Type: image/jpeg Size: 124313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C974093.gif Type: image/gif Size: 2665 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C503228.gif Type: image/gif Size: 275 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C494180.gif Type: image/gif Size: 305 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C801702.gif Type: image/gif Size: 331 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C254205.gif Type: image/gif Size: 3621 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C585014.gif Type: image/gif Size: 1243 bytes Desc: not available URL: From p.childs at qmul.ac.uk Tue Jul 24 20:28:34 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 24 Jul 2018 19:28:34 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: Message-ID: What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. >From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Tue Jul 24 22:12:06 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 24 Jul 2018 21:12:06 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: Message-ID: <366795a1f7b34edc985d85124f787774@jumptrading.com> Agree with Peter here. And if the file system and workload are of significant size then isolating the token manager to a dedicated node is definitely best practice. Unfortunately there isn't a way to specify a preferred manager per FS... (Bryan starts typing up a new RFE...). Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peter Childs Sent: Tuesday, July 24, 2018 2:29 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. >From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don't want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that 'mmdiag -tokenmgr' lists the machine as active token manager. The machine has role 'quorum-client'. This doesn't seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Wed Jul 25 17:40:46 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 25 Jul 2018 16:40:46 +0000 Subject: [gpfsug-discuss] Brief survey question: Spectrum Scale downloads and protocols Message-ID: The Spectrum Scale team is considering a change to Scale's packaging, and we'd like to get input from as many of you as possible on the likely impact. Today, Scale is available to download in two images: With Protocols, and Without Protocols. We'd like to do away with this and in future just have one image, With Protocols. To be clear, installing Protocols will still be entirely optional -- it's only the download that will change. You can find the survey here: www.surveygizmo.com/s3/4476580/IBM-Spectrum-Scale-Packaging For those interested in a little more background... Why change this? Because making two images for every Edition for every release and patch is additional work, with added testing and more opportunities for mistakes to creep in. If it's not adding real value, we'd prefer not to keep doing it! Why do we need to ask first? Because we've been doing separate images for a long time, and there was a good reason why we started doing it. But it's not clear that the original reasons are still relevant. However, we don't want to make that assumption without asking first. Thanks in advance for your help, Carl Zetie Offering Manager for Spectrum Scale, IBM - (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From SAnderson at convergeone.com Wed Jul 25 19:57:03 2018 From: SAnderson at convergeone.com (Shaun Anderson) Date: Wed, 25 Jul 2018 18:57:03 +0000 Subject: [gpfsug-discuss] Compression details Message-ID: <1532545023753.65276@convergeone.com> I've had the question come up about how SS will handle file deletion as well as overhead required for compression using zl4. The two questions I'm looking for answers (or better yet, reference material documenting) to are: 1) - How is file deletion handled? Is the block containing the compressed file decompressed, the file deleted, and then recompressed? Or is metadata simply updated showing the file is to be deleted? Does Scale run an implicit 'mmchattr --compression no' command? 2) - Are there any guidelines on the overhead to plan for in a compressed environment (lz4)? I'm not seeing any kind of sizing guidance. This is potentially going to be for an exisitng ESS GL2 system. Any assistance or direction is appreciated. Regards, ? SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments hereto may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Jul 26 00:05:27 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 25 Jul 2018 23:05:27 +0000 Subject: [gpfsug-discuss] Compression details In-Reply-To: <1532545023753.65276@convergeone.com> References: <1532545023753.65276@convergeone.com> Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Jul 26 14:24:14 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 26 Jul 2018 08:24:14 -0500 Subject: [gpfsug-discuss] Compression details In-Reply-To: <1532545023753.65276@convergeone.com> References: <1532545023753.65276@convergeone.com> Message-ID: > 1) How is file deletion handled? This depends on whether there's snapshot and whether COW is needed. If COW is not needed or there's no snapshot at all, then the file deletion is handled as non-compressed file(don't decompress the data blocks and simply discard the data blocks, then delete the inode). However, even if COW is needed, then uncompression before COW is only needed when one of following conditions is true. 1) the block to be moved is not the first block of a compression group(10 blocks is compression group since block 0). 2) the compression group ends beyond the last block of destination file (file in latest snapshot). 3) the compression group is not full and the destination file is larger. 4) the compression group ends at the last block of destination file, but the size between source and destination files are different. 5) the destination file already has some allocated blocks(COWed) within the compression group. > 2) Are there any guidelines LZ4 compression algorithm is already made good trade-off between performance and compression ratio. So it really depends on your data characters and access patterns. For example: if the data is write-once but read-many times, then there shouldn't be too much overhead as only compressed one time(I suppose decompression with lz4 doesn't consume too much resource as compression). If your data is really randomized, then compressing with lz4 doesn't give back too much help on storage space save, but still need to compress data as well as decompression when needed. But note that compressed data could also reduce the overhead to storage and network because smaller I/O size would be done for compressed file, so from application overall point of view, the overhead could be not added at all.... Regards, The Spectrum Scale (GPFS) team -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre-marie.brunet at cnes.fr Fri Jul 27 01:06:44 2018 From: pierre-marie.brunet at cnes.fr (Brunet Pierre-Marie) Date: Fri, 27 Jul 2018 00:06:44 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Message-ID: Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... => random "Unknown error 521" on NFS clients. Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers crossed !) up to now, it seems to work properly. Is there any official recommendation from IBM on this problem ? Regards, PM -- HPC center French space agency -----Message d'origine----- De?: gpfsug-discuss-bounces at spectrumscale.org De la part de gpfsug-discuss-request at spectrumscale.org Envoy??: jeudi 14 juin 2018 13:00 ??: gpfsug-discuss at spectrumscale.org Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) ---------------------------------------------------------------------- Message: 1 Date: Wed, 13 Jun 2018 17:45:44 +0300 From: "Tomer Perry" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="iso-8859-1" Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek To: gpfsug main discussion list Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add > HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Wed, 13 Jun 2018 15:14:53 +0000 From: "Wilson, Neil" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="utf-8" We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 19 ********************************************** From scale at us.ibm.com Fri Jul 27 12:56:02 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 27 Jul 2018 06:56:02 -0500 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) In-Reply-To: References: Message-ID: errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. /* Defined for the NFSv3 protocol */ #define EBADHANDLE 521 /* Illegal NFS file handle */ Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Brunet Pierre-Marie To: "gpfsug-discuss at spectrumscale.org" Date: 07/26/2018 07:17 PM Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... => random "Unknown error 521" on NFS clients. Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers crossed !) up to now, it seems to work properly. Is there any official recommendation from IBM on this problem ? Regards, PM -- HPC center French space agency -----Message d'origine----- De?: gpfsug-discuss-bounces at spectrumscale.org De la part de gpfsug-discuss-request at spectrumscale.org Envoy??: jeudi 14 juin 2018 13:00 ??: gpfsug-discuss at spectrumscale.org Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) ---------------------------------------------------------------------- Message: 1 Date: Wed, 13 Jun 2018 17:45:44 +0300 From: "Tomer Perry" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="iso-8859-1" Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek To: gpfsug main discussion list Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add > HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180613/3cf6fbce/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 13 Jun 2018 15:14:53 +0000 From: "Wilson, Neil" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="utf-8" We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From xhejtman at ics.muni.cz Fri Jul 27 13:06:11 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 27 Jul 2018 14:06:11 +0200 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) In-Reply-To: References: Message-ID: <20180727120611.aunjlxht33vp7txf@ics.muni.cz> Hello, no it is not. It's a bug in GPFS vfs layer, efix has been already released. On Fri, Jul 27, 2018 at 06:56:02AM -0500, IBM Spectrum Scale wrote: > > errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. > > > /* Defined for the NFSv3 protocol */ > #define EBADHANDLE 521 /* Illegal NFS file handle */ > > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in > other countries. > > The forum is informally monitored as time permits and should not be used > for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Brunet Pierre-Marie > To: "gpfsug-discuss at spectrumscale.org" > > Date: 07/26/2018 07:17 PM > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi, > > We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 > and RHEL 7.5 with 4 gateways servers executing Kernel NFS... > => random "Unknown error 521" on NFS clients. > > Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers > crossed !) up to now, it seems to work properly. > > Is there any official recommendation from IBM on this problem ? > > Regards, > PM > -- > HPC center > French space agency > > -----Message d'origine----- > De?: gpfsug-discuss-bounces at spectrumscale.org > De la part de > gpfsug-discuss-request at spectrumscale.org > Envoy??: jeudi 14 juin 2018 13:00 > ??: gpfsug-discuss at spectrumscale.org > Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific than > "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) > 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 13 Jun 2018 17:45:44 +0300 > From: "Tomer Perry" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Message-ID: > > > > > Content-Type: text/plain; charset="iso-8859-1" > > Please open a service ticket > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Lukas Hejtmanek > To: gpfsug main discussion list > Date: 13/06/2018 13:14 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > > knfs is supported - with or without the cNFS feature ( cNFS will add > > HA > > > to NFS on top of GPFS - > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > > > > ). > > > > knfs and cNFS can't coexist with CES in the same environment. > > well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180613/3cf6fbce/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Wed, 13 Jun 2018 15:14:53 +0000 > From: "Wilson, Neil" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Message-ID: > > > > > Content-Type: text/plain; charset="utf-8" > > We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version > 3.10.0-693.21.1.el7.x86_64 and are not having any errors. > So it's probably just GPFS not being ready for 7.5 yet. > > Neil. > > Neil Wilson? Senior IT Practitioner > Storage, Virtualisation and Mainframe Team?? IT Services > Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Jonathan Buzzard > Sent: 13 June 2018 10:33 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > > On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > > Hello, > > > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > > Not sure whether it is due to kernel or GPFS. > > > > GPFS being not supported on 7.5 at this time would be the starting point. I > am also under the impression that kernel NFS was not supported either it's > Ganesha or nothing. > > The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the > past that has worked for me. > > JAB. > > -- > Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System > Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > End of gpfsug-discuss Digest, Vol 77, Issue 19 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From neil.wilson at metoffice.gov.uk Fri Jul 27 13:26:28 2018 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Fri, 27 Jul 2018 12:26:28 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: We are still running 7.4 with 4.2.3-9 on our NSD nodes, cNFS nodes and client nodes. A rhel 7.5 client node build is being tested at the moment and will be deployed if testing is a success. However I don't think we will be upgrading the NSD nodes or cNFS nodes to 7.5 for a while. Regards Neil Neil Wilson Senior IT Practitioner Storage, Virtualisation and Mainframe Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of IBM Spectrum Scale Sent: 27 July 2018 12:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. /* Defined for the NFSv3 protocol */ #define EBADHANDLE 521 /* Illegal NFS file handle */ Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. [Inactive hide details for Brunet Pierre-Marie ---07/26/2018 07:17:25 PM---Hi, We are facing the same issue : we just upgrade o]Brunet Pierre-Marie ---07/26/2018 07:17:25 PM---Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 From: Brunet Pierre-Marie > To: "gpfsug-discuss at spectrumscale.org" > Date: 07/26/2018 07:17 PM Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... => random "Unknown error 521" on NFS clients. Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers crossed !) up to now, it seems to work properly. Is there any official recommendation from IBM on this problem ? Regards, PM -- HPC center French space agency -----Message d'origine----- De : gpfsug-discuss-bounces at spectrumscale.org > De la part de gpfsug-discuss-request at spectrumscale.org Envoy? : jeudi 14 juin 2018 13:00 ? : gpfsug-discuss at spectrumscale.org Objet : gpfsug-discuss Digest, Vol 77, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) ---------------------------------------------------------------------- Message: 1 Date: Wed, 13 Jun 2018 17:45:44 +0300 From: "Tomer Perry" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: > Content-Type: text/plain; charset="iso-8859-1" Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek > To: gpfsug main discussion list > Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add > HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Wed, 13 Jun 2018 15:14:53 +0000 From: "Wilson, Neil" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: > Content-Type: text/plain; charset="utf-8" We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From pierre-marie.brunet at cnes.fr Fri Jul 27 14:56:04 2018 From: pierre-marie.brunet at cnes.fr (Brunet Pierre-Marie) Date: Fri, 27 Jul 2018 13:56:04 +0000 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) (IBM Spectrum Scale) Message-ID: Hi Scale Team, I know but I can't reproduce the problem with a simple kernel NFS server on a RH7.5 with a local filesystem for instance. It seems to be linked somehow with GPFS 4.2.3-9... I don't know what is the behavior with previous release. But as I said, the downgrade to RHE7.4 has solved the problem... vicious bug for sure. Regards, PM -- HPC center French space agency -----Message d'origine----- De?: gpfsug-discuss-bounces at spectrumscale.org De la part de gpfsug-discuss-request at spectrumscale.org Envoy??: vendredi 27 juillet 2018 14:22 ??: gpfsug-discuss at spectrumscale.org Objet?: gpfsug-discuss Digest, Vol 78, Issue 68 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) (IBM Spectrum Scale) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) (Lukas Hejtmanek) ---------------------------------------------------------------------- Message: 1 Date: Fri, 27 Jul 2018 06:56:02 -0500 From: "IBM Spectrum Scale" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Message-ID: Content-Type: text/plain; charset="iso-8859-1" errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. /* Defined for the NFSv3 protocol */ #define EBADHANDLE 521 /* Illegal NFS file handle */ Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Brunet Pierre-Marie To: "gpfsug-discuss at spectrumscale.org" Date: 07/26/2018 07:17 PM Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We are facing the same issue : we just upgrade our cluster to GPFS 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... => random "Unknown error 521" on NFS clients. Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers crossed !) up to now, it seems to work properly. Is there any official recommendation from IBM on this problem ? Regards, PM -- HPC center French space agency -----Message d'origine----- De?: gpfsug-discuss-bounces at spectrumscale.org De la part de gpfsug-discuss-request at spectrumscale.org Envoy??: jeudi 14 juin 2018 13:00 ??: gpfsug-discuss at spectrumscale.org Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) ---------------------------------------------------------------------- Message: 1 Date: Wed, 13 Jun 2018 17:45:44 +0300 From: "Tomer Perry" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="iso-8859-1" Please open a service ticket Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Lukas Hejtmanek To: gpfsug main discussion list Date: 13/06/2018 13:14 Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > knfs is supported - with or without the cNFS feature ( cNFS will add > HA > to NFS on top of GPFS - > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_cnfs.htm > ). > > knfs and cNFS can't coexist with CES in the same environment. well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180613/3cf6fbce/attachment-0001.html > ------------------------------ Message: 2 Date: Wed, 13 Jun 2018 15:14:53 +0000 From: "Wilson, Neil" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 Message-ID: Content-Type: text/plain; charset="utf-8" We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version 3.10.0-693.21.1.el7.x86_64 and are not having any errors. So it's probably just GPFS not being ready for 7.5 yet. Neil. Neil Wilson? Senior IT Practitioner Storage, Virtualisation and Mainframe Team?? IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 13 June 2018 10:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > Hello, > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > Not sure whether it is due to kernel or GPFS. > GPFS being not supported on 7.5 at this time would be the starting point. I am also under the impression that kernel NFS was not supported either it's Ganesha or nothing. The interim fix is probably to downgrade to a 7.4 kernel. Certainly in the past that has worked for me. JAB. -- Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 77, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ Message: 2 Date: Fri, 27 Jul 2018 14:06:11 +0200 From: Lukas Hejtmanek To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) Message-ID: <20180727120611.aunjlxht33vp7txf at ics.muni.cz> Content-Type: text/plain; charset=utf8 Hello, no it is not. It's a bug in GPFS vfs layer, efix has been already released. On Fri, Jul 27, 2018 at 06:56:02AM -0500, IBM Spectrum Scale wrote: > > errno 521 is EBADHANDLE (a Linux NFS error); it is not from spectrum scale. > > > /* Defined for the NFSv3 protocol */ > #define EBADHANDLE 521 /* Illegal NFS file handle */ > > > Regards, The Spectrum Scale (GPFS) team > > ---------------------------------------------------------------------- > -------------------------------------------- > > If you feel that your question can benefit other users of Spectrum > Scale (GPFS), then please post it to the public IBM developerWroks > Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) and you have an IBM software maintenance contract please > contact > 1-800-237-5511 in the United States or your local IBM Service Center > in other countries. > > The forum is informally monitored as time permits and should not be > used for priority messages to the Spectrum Scale (GPFS) team. > > > > From: Brunet Pierre-Marie > To: "gpfsug-discuss at spectrumscale.org" > > Date: 07/26/2018 07:17 PM > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi, > > We are facing the same issue : we just upgrade our cluster to GPFS > 4.2.3-9 and RHEL 7.5 with 4 gateways servers executing Kernel NFS... > => random "Unknown error 521" on NFS clients. > > Thanks to this thread we decided to downgrade to RHEL 7.4 and (fingers > crossed !) up to now, it seems to work properly. > > Is there any official recommendation from IBM on this problem ? > > Regards, > PM > -- > HPC center > French space agency > > -----Message d'origine----- > De?: gpfsug-discuss-bounces at spectrumscale.org > De la part de > gpfsug-discuss-request at spectrumscale.org > Envoy??: jeudi 14 juin 2018 13:00 > ??: gpfsug-discuss at spectrumscale.org > Objet?: gpfsug-discuss Digest, Vol 77, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than > "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: GPFS 4.2.3-9 and RHEL 7.5 (Tomer Perry) > 2. Re: GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 13 Jun 2018 17:45:44 +0300 > From: "Tomer Perry" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Message-ID: > > llabserv.com> > > > Content-Type: text/plain; charset="iso-8859-1" > > Please open a service ticket > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Lukas Hejtmanek > To: gpfsug main discussion list > Date: 13/06/2018 13:14 > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > On Wed, Jun 13, 2018 at 12:48:14PM +0300, Tomer Perry wrote: > > knfs is supported - with or without the cNFS feature ( cNFS will > > add HA > > > to NFS on top of GPFS - > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.sp > ectrum.scale.v5r01.doc/bl1adv_cnfs.htm > > > > ). > > > > knfs and cNFS can't coexist with CES in the same environment. > > well, I use knfs and no CNFS n RHEL 7.5 and getting mentioned issues. > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > -------------- next part -------------- An HTML attachment was > scrubbed... > URL: < > http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180613/3cf6fb > ce/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Wed, 13 Jun 2018 15:14:53 +0000 > From: "Wilson, Neil" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > Message-ID: > > changelabs.com> > > > Content-Type: text/plain; charset="utf-8" > > We are running a cNFS on GPFS 4.2.3-9 on 7.4 and kernel version > 3.10.0-693.21.1.el7.x86_64 and are not having any errors. > So it's probably just GPFS not being ready for 7.5 yet. > > Neil. > > Neil Wilson? Senior IT Practitioner > Storage, Virtualisation and Mainframe Team?? IT Services Met Office > FitzRoy Road Exeter Devon EX1 3PB United Kingdom > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Jonathan > Buzzard > Sent: 13 June 2018 10:33 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 > > On Wed, 2018-06-13 at 11:10 +0200, Lukas Hejtmanek wrote: > > Hello, > > > > did anyone encountered an error with RHEL 7.5 kernel 3.10.0- > > 862.3.2.el7.x86_64 and the latest GPFS 4.2.3-9 with kernel NFS? > > > > I'm getting random errors: Unknown error 521. It means EBADHANDLE. > > Not sure whether it is due to kernel or GPFS. > > > > GPFS being not supported on 7.5 at this time would be the starting > point. I am also under the impression that kernel NFS was not > supported either it's Ganesha or nothing. > > The interim fix is probably to downgrade to a 7.4 kernel. Certainly in > the past that has worked for me. > > JAB. > > -- > Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420 HPC > System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > End of gpfsug-discuss Digest, Vol 77, Issue 19 > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 78, Issue 68 ********************************************** From scale at us.ibm.com Fri Jul 27 15:43:16 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 27 Jul 2018 09:43:16 -0500 Subject: [gpfsug-discuss] GPFS 4.2.3-9 and RHEL 7.5 (Wilson, Neil) In-Reply-To: <20180727120611.aunjlxht33vp7txf@ics.muni.cz> References: <20180727120611.aunjlxht33vp7txf@ics.muni.cz> Message-ID: There is a fix in 4.2.3.9 efix3 that corrects a condition where GPFS was failing a revalidate call and that was causing kNFS to generate EBADHANDLE. Without more information on your case (traces), I cannot say for sure that this will resolve your issue, but it is available for you to try. Regards, The Spectrum Scale (GPFS) team -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 27 16:18:50 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 27 Jul 2018 15:18:50 +0000 Subject: [gpfsug-discuss] Power9 / GPFS Message-ID: I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 27 16:30:42 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 27 Jul 2018 15:30:42 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: References: Message-ID: <13B6CE9B-CF93-43BB-A120-136CCC3AC7BC@vanderbilt.edu> Hi Simon, Have you tried running it with the ??silent? flag, too? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 27, 2018, at 10:18 AM, Simon Thompson > wrote: I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9660d98faa7b4241b52508d5f3d44462%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636683015365941338&sdata=8%2BKtcv8Tm3S5OS67xX5lOZatL%2B7mHZ71HXgm6dalEmg%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Fri Jul 27 16:32:55 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Fri, 27 Jul 2018 15:32:55 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: <366795a1f7b34edc985d85124f787774@jumptrading.com> References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: Thank you, The cluster was freshly set up and the VM node never was denoted as manager, it was created as quorum-client. What I didn?t mention but probably should have: This is a multicluster mount, the cluster has no own storage. Hence the filesystem manager are on the home cluster, according to mmlsmgr. Hm, probably more complicated as I initially thought. Still I would expect that for file-access that is restricted to this cluster all token management is handled inside the cluster, too? And I don?t want the weakest node to participate. Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of Bryan Banister Reply-To: gpfsug main discussion list Date: Tuesday 24 July 2018 at 23:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Agree with Peter here. And if the file system and workload are of significant size then isolating the token manager to a dedicated node is definitely best practice. Unfortunately there isn?t a way to specify a preferred manager per FS? (Bryan starts typing up a new RFE?). Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peter Childs Sent: Tuesday, July 24, 2018 2:29 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Fri Jul 27 16:40:11 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Fri, 27 Jul 2018 15:40:11 +0000 Subject: [gpfsug-discuss] Power9 / GPFS Message-ID: <6756DCC6-7366-4F8C-8D61-F38D0241CDB4@psi.ch> Hello If you don?t need the installer maybe just extract the RPMs, this bypasses java. For x86_64 I use commands like the once below, shouldn?t be much different on power. TARFILE=$1 START=$( grep -a -m 1 ^PGM_BEGIN_TGZ= $TARFILE| cut -d= -f2) echo extract RPMs from $TARFILE with START=$START tail -n +$START $TARFILE | tar xvzf - *.rpm */repodata/* Kind regards, Heiner -- Paul Scherrer Institut From: on behalf of Simon Thompson Reply-To: gpfsug main discussion list Date: Friday 27 July 2018 at 17:19 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Power9 / GPFS I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 27 16:41:39 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 27 Jul 2018 15:41:39 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: <13B6CE9B-CF93-43BB-A120-136CCC3AC7BC@vanderbilt.edu> References: <13B6CE9B-CF93-43BB-A120-136CCC3AC7BC@vanderbilt.edu> Message-ID: Yeah does the same ? The system java seems to do it is well ? maybe its just broken ? Simon From: on behalf of "Buterbaugh, Kevin L" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 16:32 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS Hi Simon, Have you tried running it with the ??silent? flag, too? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jul 27, 2018, at 10:18 AM, Simon Thompson > wrote: I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9660d98faa7b4241b52508d5f3d44462%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636683015365941338&sdata=8%2BKtcv8Tm3S5OS67xX5lOZatL%2B7mHZ71HXgm6dalEmg%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Fri Jul 27 16:35:14 2018 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 27 Jul 2018 15:35:14 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: References: Message-ID: License acceptance notwithstanding, the RPM extraction should at least be achievable with? tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: Friday, July 27, 2018 11:19 AM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Power9 / GPFS I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jul 27 16:54:16 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 27 Jul 2018 15:54:16 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: <6756DCC6-7366-4F8C-8D61-F38D0241CDB4@psi.ch> References: <6756DCC6-7366-4F8C-8D61-F38D0241CDB4@psi.ch> Message-ID: <986024E4-512D-45A0-A859-EBED468B07A3@bham.ac.uk> Thanks, (and also Paul with a very similar comment)? I now have my packages unpacked ? and hey, who needs java anyway ? Simon From: on behalf of "heiner.billich at psi.ch" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 16:40 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS Hello If you don?t need the installer maybe just extract the RPMs, this bypasses java. For x86_64 I use commands like the once below, shouldn?t be much different on power. TARFILE=$1 START=$( grep -a -m 1 ^PGM_BEGIN_TGZ= $TARFILE| cut -d= -f2) echo extract RPMs from $TARFILE with START=$START tail -n +$START $TARFILE | tar xvzf - *.rpm */repodata/* Kind regards, Heiner -- Paul Scherrer Institut From: on behalf of Simon Thompson Reply-To: gpfsug main discussion list Date: Friday 27 July 2018 at 17:19 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Power9 / GPFS I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From gcorneau at us.ibm.com Fri Jul 27 17:02:42 2018 From: gcorneau at us.ibm.com (Glen Corneau) Date: Fri, 27 Jul 2018 11:02:42 -0500 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: References: Message-ID: Just curious, do you have the zStream1 patches installed? # uname -a Linux ac922a.pvw.ibm.com 4.14.0-49.2.2.el7a.ppc64le #1 SMP Fri Apr 27 15:37:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ------------------ Glen Corneau Cognitive Systems Washington Systems Center gcorneau at us.ibm.com From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 07/27/2018 10:19 AM Subject: [gpfsug-discuss] Power9 / GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 26117 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Fri Jul 27 17:05:37 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 27 Jul 2018 16:05:37 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: References: Message-ID: <23C90277-1A82-4802-9902-6BB7149B4563@bham.ac.uk> # uname -a Linux localhost.localdomain 4.14.0-49.el7a.ppc64le #1 SMP Wed Mar 14 13:58:40 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux Its literally out of the box ? Simon From: on behalf of "gcorneau at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 17:03 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS Just curious, do you have the zStream1 patches installed? # uname -a Linux ac922a.pvw.ibm.com 4.14.0-49.2.2.el7a.ppc64le #1 SMP Fri Apr 27 15:37:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ------------------ Glen Corneau Cognitive Systems Washington Systems Center gcorneau at us.ibm.com [cid:_2_DC560798DC56051000576CD7862582D7] From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 07/27/2018 10:19 AM Subject: [gpfsug-discuss] Power9 / GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 26118 bytes Desc: image001.jpg URL: From heiner.billich at psi.ch Fri Jul 27 17:50:17 2018 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Fri, 27 Jul 2018 16:50:17 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: Hello, So probably I was wrong from the beginning ? please can somebody clarify: In a multicluster environment with all storage and filesystem hosted by a single cluster all token managers will reside in this central cluster? Or are there also token managers in the storage-less clusters which just mount? This managers wouldn?t be accessible by all nodes which access the file system, hence I doubt this exists. Still it would be nice to know how to influence the token manager placement and how to exclude certain machines. And the output of ?mmdiag ?tokenmgr? indicates that there _are_ token manager in the remote-mounting cluster ? confusing. I would greatly appreciate if somebody could sort this out. A point to the relevant documentation would also be welcome. Thank you & Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of "Billich Heinrich Rainer (PSI)" Reply-To: gpfsug main discussion list Date: Friday 27 July 2018 at 17:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Thank you, The cluster was freshly set up and the VM node never was denoted as manager, it was created as quorum-client. What I didn?t mention but probably should have: This is a multicluster mount, the cluster has no own storage. Hence the filesystem manager are on the home cluster, according to mmlsmgr. Hm, probably more complicated as I initially thought. Still I would expect that for file-access that is restricted to this cluster all token management is handled inside the cluster, too? And I don?t want the weakest node to participate. Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: on behalf of Bryan Banister Reply-To: gpfsug main discussion list Date: Tuesday 24 July 2018 at 23:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Agree with Peter here. And if the file system and workload are of significant size then isolating the token manager to a dedicated node is definitely best practice. Unfortunately there isn?t a way to specify a preferred manager per FS? (Bryan starts typing up a new RFE?). Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peter Childs Sent: Tuesday, July 24, 2018 2:29 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Jul 27 18:09:56 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 27 Jul 2018 17:09:56 +0000 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: <40989560bbc0448896e0301407388790@jumptrading.com> Yes, the token managers will reside on the NSD Server Cluster which has the NSD Servers that provide access to the underlying data and metadata storage. I believe that all nodes that have the ?manager? designation will participate in the token management operations as needed. Though there is not a way to specify which node will be assigned the primary file system manager or overall cluster manager, which are two different roles but may reside on the same node. Tokens themselves, however, are distributed and managed by clients directly. When a file is first opened then the node that opened the file will be the ?metanode? for the file, and all metadata updates on the file will be handled by this metanode until it closes the file handle, in which case another node will become the ?metanode?. For byte range locking, the file system manager will handle revoking tokens from nodes that have a byte range lock when another node requests access to the same byte range region. This ensures that nodes cannot hold byte range locks that prevent other nodes from accessing byte range regions of a file. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Billich Heinrich Rainer (PSI) Sent: Friday, July 27, 2018 11:50 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ Hello, So probably I was wrong from the beginning ? please can somebody clarify: In a multicluster environment with all storage and filesystem hosted by a single cluster all token managers will reside in this central cluster? Or are there also token managers in the storage-less clusters which just mount? This managers wouldn?t be accessible by all nodes which access the file system, hence I doubt this exists. Still it would be nice to know how to influence the token manager placement and how to exclude certain machines. And the output of ?mmdiag ?tokenmgr? indicates that there _are_ token manager in the remote-mounting cluster ? confusing. I would greatly appreciate if somebody could sort this out. A point to the relevant documentation would also be welcome. Thank you & Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: > on behalf of "Billich Heinrich Rainer (PSI)" > Reply-To: gpfsug main discussion list > Date: Friday 27 July 2018 at 17:33 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] control which hosts become token manager Thank you, The cluster was freshly set up and the VM node never was denoted as manager, it was created as quorum-client. What I didn?t mention but probably should have: This is a multicluster mount, the cluster has no own storage. Hence the filesystem manager are on the home cluster, according to mmlsmgr. Hm, probably more complicated as I initially thought. Still I would expect that for file-access that is restricted to this cluster all token management is handled inside the cluster, too? And I don?t want the weakest node to participate. Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From: > on behalf of Bryan Banister > Reply-To: gpfsug main discussion list > Date: Tuesday 24 July 2018 at 23:12 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] control which hosts become token manager Agree with Peter here. And if the file system and workload are of significant size then isolating the token manager to a dedicated node is definitely best practice. Unfortunately there isn?t a way to specify a preferred manager per FS? (Bryan starts typing up a new RFE?). Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Peter Childs Sent: Tuesday, July 24, 2018 2:29 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] control which hosts become token manager Note: External Email ________________________________ What does mmlsmgr show? Your config looks fine. I suspect you need to do a mmchmgr perf node-1.psi.ch mmchmgr tiered node-2.psi.ch It looks like the node was set up as a manager and was demoted to just quorum but since its still currently the manager it needs to be told to stop. From experience it's also worth having different file system managers on different nodes, if at all possible. But that's just a guess without seeing the output of mmlsmgr. Peter Childs Research Storage ITS Research and Teaching Support Queen Mary, University of London ---- Billich Heinrich Rainer (PSI) wrote ---- Hello, I want to control which nodes can become token manager. In detail I run a virtual machine as quorum node. I don?t want this machine to become a token manager - it has no access to Infiniband and only very limited memory. What I see is that ?mmdiag ?tokenmgr? lists the machine as active token manager. The machine has role ?quorum-client?. This doesn?t seem sufficient to exclude it. Is there any way to tell spectrum scale to exclude this single machine with role quorum-client? I run 5.0.1-1. Sorry if this is a faq, I did search quite a bit before I wrote to the list. Thank you, Heiner Billich [root at node-2 ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: node.psi.ch GPFS cluster id: 5389874024582403895 GPFS UID domain: node.psi.ch Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation -------------------------------------------------------------------------------- 1 node-1.psi.ch a.b.95.31 node-1.psi.ch quorum-manager 2 node-2.psi.ch a.b.95.32 node-2.psi.ch quorum-manager 3 node-quorum.psi.ch a.b.95.30 node-quorum.psi.ch quorum <<<< VIRTUAL MACHINE >>>>>>>>> [root at node-2 ~]# mmdiag --tokenmgr === mmdiag: tokenmgr === Token Domain perf There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> Token Domain tiered There are 3 active token servers in this domain. Server list: a.b.95.120 a.b.95.121 a.b.95.122 <<<< VIRTUAL MACHINE >>>>>>>>> -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Jul 27 18:31:46 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 27 Jul 2018 12:31:46 -0500 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: Only nodes in the home cluster will participate as token managers. Note that "mmdiag --tokenmgr" lists all potential token manager nodes, but there will be additional information for the nodes that are currently appointed. --tokenmgr Displays information about token management. For each mounted GPFS file system, one or more token manager nodes is appointed. The first token manager is always colocated with the file system manager, while other token managers can be appointed from the pool of nodes with the manager designation. The information that is shown here includes the list of currently appointed token manager nodes and, if the current node is serving as a token manager, some statistics about prior token transactions. Regards, The Spectrum Scale (GPFS) team -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Fri Jul 27 19:27:19 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 27 Jul 2018 20:27:19 +0200 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data In-Reply-To: <83A6EEB0EC738F459A39439733AE80452672ADC8@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE80452672ADC8@MBX114.d.ethz.ch> Message-ID: Hi, as there are more often similar questions rising, we just put an article about the topic on the Spectrum Scale Wiki https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20 (GPFS)/page/Downsampling%2C%20Upsampling%20and%20Aggregation%20of%20the%20performance%20data While there will be some minor updates on the article in the next time, it might already explain your questions. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 13.07.2018 12:08 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes. I've the following perfmon configuration for the metric-group GPFSNSDDisk: { name = "GPFSNSDDisk" period = 2 restrict = "nsdNodes" }, that, as far as I know sends data to the collector every 2 seconds (correct ?). But how ? does it send what it reads from the counter every two seconds ? or does it aggregated in some way ? or what else ? In the collector node pmcollector, grafana-bridge and grafana-server run. Now I need to understand how to play with the grafana parameters: - Down sample (or Disable downsampling) - Aggregator (following on the same row the metrics). See attached picture 4s.png as reference. In the past I had the period set to 1. And grafana used to display correct data (bytes/s for the metric gpfs_nsdds_bytes_written) with aggregator set to "sum", which AFAIK means "sum all that metrics that match the filter below" (again see the attached picture to see how the filter is set to only collect data from the IO nodes). Today I've changed to "period=2"... and grafana started to display funny data rate (the double, or quad of the real rate). I had to play (almost randomly) with "Aggregator" (from sum to avg, which as fas as I undestand doesn't mean anything in my case... average between the two IO nodes ? or what ?) and "Down sample" (from empty to 2s, and then to 4s) to get back real data rate which is compliant with what I do get with dstat. Can someone kindly explain how to play with these parameters when zimon sensor's period is changed ? Many thanks in advance Regards, Alvise Dorigo[attachment "4s.png" deleted by Manfred Haubrich/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Achim.Rehor at de.ibm.com Sat Jul 28 10:16:04 2018 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Sat, 28 Jul 2018 11:16:04 +0200 Subject: [gpfsug-discuss] control which hosts become token manager In-Reply-To: References: <366795a1f7b34edc985d85124f787774@jumptrading.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Mon Jul 30 16:27:28 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 30 Jul 2018 15:27:28 +0000 Subject: [gpfsug-discuss] Power9 / GPFS In-Reply-To: <23C90277-1A82-4802-9902-6BB7149B4563@bham.ac.uk> References: <23C90277-1A82-4802-9902-6BB7149B4563@bham.ac.uk> Message-ID: <24C8CF4A-D0D9-4DC0-B499-6B64D50DF3BC@bham.ac.uk> Just to close the loop on this, this is a bug in the RHEL7.5 first shipped alt kernel for the P9 systems. Patching to a later kernel errata package fixed the issues. I?ve confirmed that upgrading and re-running the installer works fine. Thanks to Julian who contacted me off-list about this. Simon From: on behalf of Simon Thompson Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 17:06 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS # uname -a Linux localhost.localdomain 4.14.0-49.el7a.ppc64le #1 SMP Wed Mar 14 13:58:40 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux Its literally out of the box ? Simon From: on behalf of "gcorneau at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 27 July 2018 at 17:03 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Power9 / GPFS Just curious, do you have the zStream1 patches installed? # uname -a Linux ac922a.pvw.ibm.com 4.14.0-49.2.2.el7a.ppc64le #1 SMP Fri Apr 27 15:37:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux ------------------ Glen Corneau Cognitive Systems Washington Systems Center gcorneau at us.ibm.com [cid:_2_DC560798DC56051000576CD7862582D7] From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 07/27/2018 10:19 AM Subject: [gpfsug-discuss] Power9 / GPFS Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I feel like I must be doing something stupid here but ? We?re trying to install GPFS onto some Power 9 AI systems we?ve just got? So from Fix central, we download ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install?, however we are failing to unpack the file: ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 --text-only Extracting License Acceptance Process Tool to 5.0.1.1 ... tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm --exclude=*tgz --exclude=*deb 1> /dev/null Installing JRE ... If directory 5.0.1.1 has been created or was previously created during another extraction, .rpm, .deb, and repository related files in it (if there were) will be removed to avoid conflicts with the ones being extracted. tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | tar -C 5.0.1.1 --wildcards -xvz ibm-java*tgz 1> /dev/null tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz Invoking License Acceptance Process Tool ... 5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1 -text_only Unhandled exception Type=Segmentation error vmState=0xffffffff J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 Handler1=00007FFFB194FC80 Handler2=00007FFFB176EA40 R0=00007FFFB176A0E8 R1=00007FFFB23AC5D0 R2=00007FFFB2737400 R3=0000000000000000 R4=00007FFFB17D2AA4 R5=0000000000000006 R6=0000000000000000 R7=00007FFFAC12A3C0 This looks like the java runtime is failing during the license approval status. First off, can someone confirm that ?Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install? is indeed the correct package we are downloading for Power9, and then any tips on how to extract the packages. These systems are running the IBM factory shipped install of RedHat 7.5. Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 26119 bytes Desc: image001.jpg URL: From Renar.Grunenberg at huk-coburg.de Tue Jul 31 10:03:54 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 31 Jul 2018 09:03:54 +0000 Subject: [gpfsug-discuss] Question about mmsdrrestore Message-ID: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> Hallo All, are there some experiences about the possibility to install/upgrade some existing nodes in a GPFS 4.2.3.x Cluster (OS Rhel6.7) with a fresh OS install to rhel7.5 and reinstall then new GPFS code 5.0.1.1 and do a mmsdrrestore on these node from a 4.2.3 Node. Is it possible, or must we install 4.2.3 Code first, make the mmsdrestore step and then update to 5.0.1.1? Any Hints are appreciate. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Jul 31 10:09:37 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 31 Jul 2018 09:09:37 +0000 Subject: [gpfsug-discuss] Question about mmsdrrestore In-Reply-To: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> References: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> Message-ID: My gut feeling says it?s not possible. If this were me I?d upgrade to 5.0.1.1, make sure it?s working, and then reinstall the node. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Grunenberg, Renar Sent: 31 July 2018 10:04 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Question about mmsdrrestore Hallo All, are there some experiences about the possibility to install/upgrade some existing nodes in a GPFS 4.2.3.x Cluster (OS Rhel6.7) with a fresh OS install to rhel7.5 and reinstall then new GPFS code 5.0.1.1 and do a mmsdrrestore on these node from a 4.2.3 Node. Is it possible, or must we install 4.2.3 Code first, make the mmsdrestore step and then update to 5.0.1.1? Any Hints are appreciate. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jul 31 14:03:52 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 31 Jul 2018 13:03:52 +0000 Subject: [gpfsug-discuss] mmdf vs. df Message-ID: Hallo All, a question whats happening here: We are on GPFS 5.0.1.1 and host a TSM-Server-Cluster. A colleague from me want to add new nsd?s to grow its tsm-storagepool (filedevice class volumes). The tsmpool fs has before 45TB of space after that 128TB. We create new 50 GB tsm-volumes with define volume cmd, but the cmd goes in error after the allocating of 89TB. Following Outputs here: [root at node_a tsmpool]# df -hT Filesystem Type Size Used Avail Use% Mounted on tsmpool gpfs 128T 128T 44G 100% /gpfs/tsmpool root at node_a tsmpool]# mmdf tsmpool --block-size auto disk disk size failure holds holds free free name group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: system (Maximum disk size allowed is 839.99 GB) nsd_r2g8f_tsmpool_001 100G 0 Yes No 88G ( 88%) 10.4M ( 0%) nsd_c4g8f_tsmpool_001 100G 1 Yes No 88G ( 88%) 10.4M ( 0%) nsd_g4_tsmpool 256M 2 No No 0 ( 0%) 0 ( 0%) ------------- -------------------- ------------------- (pool total) 200.2G 176G ( 88%) 20.8M ( 0%) Disks in storage pool: data01 (Maximum disk size allowed is 133.50 TB) nsd_r2g8d_tsmpool_016 8T 0 No Yes 3.208T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_015 8T 0 No Yes 3.205T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_014 8T 0 No Yes 3.208T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_013 8T 0 No Yes 3.206T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_012 8T 0 No Yes 3.208T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_011 8T 0 No Yes 3.205T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_001 8T 0 No Yes 1.48G ( 0%) 14.49M ( 0%) nsd_r2g8d_tsmpool_002 8T 0 No Yes 1.582G ( 0%) 16.12M ( 0%) nsd_r2g8d_tsmpool_003 8T 0 No Yes 1.801G ( 0%) 14.7M ( 0%) nsd_r2g8d_tsmpool_004 8T 0 No Yes 1.629G ( 0%) 15.21M ( 0%) nsd_r2g8d_tsmpool_005 8T 0 No Yes 1.609G ( 0%) 14.22M ( 0%) nsd_r2g8d_tsmpool_006 8T 0 No Yes 1.453G ( 0%) 17.4M ( 0%) nsd_r2g8d_tsmpool_010 8T 0 No Yes 3.208T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_009 8T 0 No Yes 3.197T ( 40%) 7.867M ( 0%) nsd_r2g8d_tsmpool_007 8T 0 No Yes 3.194T ( 40%) 7.875M ( 0%) nsd_r2g8d_tsmpool_008 8T 0 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_016 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_006 8T 1 No Yes 888M ( 0%) 21.63M ( 0%) nsd_c4g8d_tsmpool_005 8T 1 No Yes 996M ( 0%) 18.22M ( 0%) nsd_c4g8d_tsmpool_004 8T 1 No Yes 920M ( 0%) 11.21M ( 0%) nsd_c4g8d_tsmpool_003 8T 1 No Yes 984M ( 0%) 14.7M ( 0%) nsd_c4g8d_tsmpool_002 8T 1 No Yes 1.082G ( 0%) 11.89M ( 0%) nsd_c4g8d_tsmpool_001 8T 1 No Yes 1.035G ( 0%) 14.49M ( 0%) nsd_c4g8d_tsmpool_007 8T 1 No Yes 3.281T ( 41%) 7.867M ( 0%) nsd_c4g8d_tsmpool_008 8T 1 No Yes 3.199T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_009 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_010 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_011 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_012 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_013 8T 1 No Yes 3.195T ( 40%) 7.867M ( 0%) nsd_c4g8d_tsmpool_014 8T 1 No Yes 3.195T ( 40%) 7.875M ( 0%) nsd_c4g8d_tsmpool_015 8T 1 No Yes 3.194T ( 40%) 7.867M ( 0%) ------------- -------------------- ------------------- (pool total) 256T 64.09T ( 25%) 341.6M ( 0%) ============= ==================== =================== (data) 256T 64.09T ( 25%) 341.6M ( 0%) (metadata) 200G 176G ( 88%) 20.8M ( 0%) ============= ==================== =================== (total) 256.2T 64.26T ( 25%) 362.4M ( 0%) In GPFS we had already space but the above df seems to be wrong and that make TSM unhappy. If we manually wrote a 50GB File in this FS like: [root at sap00733 tsmpool]# dd if=/dev/zero of=/gpfs/tsmpool/output bs=2M count=25600 25600+0 records in 25600+0 records out 53687091200 bytes (54 GB) copied, 30.2908 s, 1.8 GB/s We see at df level now these: [root at sap00733 tsmpool]# df -hT Filesystem Type Size Used Avail Use% Mounted on tsmpool gpfs 128T 96T 33T 75% /gpfs/tsmpool if we delete these file we see already the first output of 44G free space only. This seems to be the os df Interface seems to be brocken here. What I also must mentioned we use some ignore parameters: root @node_a(rhel7.4)> mmfsadm dump config |grep ignore ignoreNonDioInstCount 0 ! ignorePrefetchLUNCount 1 ignoreReplicaSpaceOnStat 0 ignoreReplicationForQuota 0 ! ignoreReplicationOnStatfs 1 ignoreSync 0 the fs has the -S relatime option. Are there any Known bug here existend ? Any hints on that? Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.utermann at physik.uni-augsburg.de Tue Jul 31 16:02:51 2018 From: ralf.utermann at physik.uni-augsburg.de (Ralf Utermann) Date: Tue, 31 Jul 2018 17:02:51 +0200 Subject: [gpfsug-discuss] Question about mmsdrrestore In-Reply-To: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> References: <433249243e7a4516976293a9f7f781e5@SMXRF105.msg.hukrf.de> Message-ID: <1de976d6-bc61-b1ff-b953-b28886f8e2c4@physik.uni-augsburg.de> Hi Renar, we reinstalled a previous Debian jessie + GPFS 4.2.3 client to Ubuntu 16.04 + GPFS 5.0.1-1 and did a mmsdrrestore from one of our 4.2.3.8 NSD servers without problems. regards, Ralf On 31.07.2018 11:03, Grunenberg, Renar wrote: > Hallo All, > > ? > > are there some experiences about the possibility to install/upgrade some > existing nodes in a GPFS 4.2.3.x Cluster (OS Rhel6.7) with a fresh OS install to > rhel7.5 and reinstall then new GPFS code 5.0.1.1 > > and do a mmsdrrestore on these node from a 4.2.3 Node. Is it possible, or must > we install 4.2.3 Code first, make the mmsdrestore step and then update to 5.0.1.1? > > Any Hints are appreciate. > > Renar?Grunenberg > Abteilung Informatik ? Betrieb > > HUK-COBURG > Bahnhofsplatz > 96444?Coburg > Telefon: 09561 96-44110 > Telefax: 09561 96-44104 > E-Mail: Renar.Grunenberg at huk-coburg.de > Internet: www.huk.de > > -------------------------------------------------------------------------------- > HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands > a. G. in Coburg > Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 > Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg > Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. > Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav > Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. > -------------------------------------------------------------------------------- > Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist > nicht gestattet. > > This information may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this information in > error) please notify the > sender immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in this > information is strictly forbidden. > -------------------------------------------------------------------------------- > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Ralf Utermann _____________________________________________________________________ Universit?t Augsburg, Institut f?r Physik -- EDV-Betreuer Universit?tsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: Ralf.Utermann at Physik.Uni-Augsburg.DE Fax: -3411