From Robert.Oesterlin at nuance.com Tue May 2 01:24:44 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 2 May 2017 00:24:44 +0000 Subject: [gpfsug-discuss] Supercomputing Hotels 2017 Hotels - Reserve Early! Message-ID: <7ED8704A-698A-4109-B843-EB6E8FF07478@nuance.com> Hotel reservations for the Supercomputing conference opened today, and the rooms are filling up VERY fast. My advice to everyone is that if you are at all considering going - reserve now. You can do so at no charge and can cancel for free up till mid-October. Cheap and close hotels already have some dates filled up. http://sc17.supercomputing.org/attendees/attendee-housing/ Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue May 2 10:58:02 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Tue, 2 May 2017 09:58:02 +0000 Subject: [gpfsug-discuss] Meet other spectrum scale users in May In-Reply-To: <1f483faa9cb61dcdc80afb187e908745@webmail.gpfsug.org> References: <1f483faa9cb61dcdc80afb187e908745@webmail.gpfsug.org> Message-ID: Hi All, Just to note that we need to send final numbers of the venue today for lunches etc, so if you are planning to attend, please register NOW! (otherwise you might not get lunch/entry to the evening event) Thanks Simon From: > on behalf of Secretary GPFS UG > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 27 April 2017 at 09:29 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Meet other spectrum scale users in May Dear Members, Please join us and other spectrum scale users for 2 days of great talks and networking! When: 9-10th May 2017 Where: Macdonald Manchester Hotel & Spa, Manchester, UK (right by Manchester Piccadilly train station) Who? The event is free to attend, is open to members from all industries and welcomes users with a little and a lot of experience using Spectrum Scale. The SSUG brings together the Spectrum Scale User Community including Spectrum Scale developers and architects to share knowledge, experiences and future plans. Topics include transparent cloud tiering, AFM, automation and security best practices, Docker and HDFS support, problem determination, and an update on Elastic Storage Server (ESS). Our popular forum includes interactive problem solving, a best practices discussion and networking. We're very excited to welcome back Doris Conti the Director for Spectrum Scale (GPFS) and HPC SW Product Development at IBM. The May meeting is sponsored by IBM, DDN, Lenovo, Mellanox, Seagate, Arcastream, Ellexus, and OCF. It is an excellent opportunity to learn more and get your questions answered. Register your place today at the Eventbrite page https://goo.gl/tRptru We hope to see you there! -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue May 2 21:21:42 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 2 May 2017 15:21:42 -0500 Subject: [gpfsug-discuss] AFM Message-ID: Hello all, Is there any way to rate limit the AFM traffic? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From scale at us.ibm.com Wed May 3 02:37:52 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 2 May 2017 21:37:52 -0400 Subject: [gpfsug-discuss] AFM In-Reply-To: References: Message-ID: Not that I am aware and QoS is not supported with any of the AFM traffic. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Matt Weil To: gpfsug main discussion list Date: 05/02/2017 04:22 PM Subject: [gpfsug-discuss] AFM Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello all, Is there any way to rate limit the AFM traffic? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed May 3 03:20:24 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 3 May 2017 02:20:24 +0000 Subject: [gpfsug-discuss] AFM In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From martinsworkmachine at gmail.com Wed May 3 13:29:43 2017 From: martinsworkmachine at gmail.com (J Martin Rushton) Date: Wed, 3 May 2017 13:29:43 +0100 Subject: [gpfsug-discuss] Introduction Message-ID: Hi All As requested here is a brief introduction. I run a small cluster of 41 Linux nodes and we use GPFS for the user filesystems, user applications and a bunch of stuff in /opt. Backup/Archive is by Tivoli. Most user work is batch, with run times up to a couple of months (which makes updates a problem at times). I'm based near Sevenoaks in Kent, England. Regards, Martin From SAnderson at convergeone.com Wed May 3 18:08:36 2017 From: SAnderson at convergeone.com (Shaun Anderson) Date: Wed, 3 May 2017 17:08:36 +0000 Subject: [gpfsug-discuss] Tiebreaker disk question Message-ID: <1493831316163.52984@convergeone.com> We noticed some odd behavior recently. I have a customer with a small Scale (with Archive on top) configuration that we recently updated to a dual node configuration. We are using CES and setup a very small 3 nsd shared-root filesystem(gpfssr). We also set up tiebreaker disks and figured it would be ok to use the gpfssr NSDs for this purpose. When we tried to perform some basic failover testing, both nodes came down. It appears from the logs that when we initiated the node failure (via mmshutdown command...not great, I know) it unmounts and remounts the shared-root filesystem. When it did this, the cluster lost access to the tiebreaker disks, figured it had lost quorum and the other node came down as well. We got around this by changing the tiebreaker disks to our other normal gpfs filesystem. After that failover worked as expected. This is documented nowhere as far as I could find?. I wanted to know if anybody else had experienced this and if this is expected behavior. All is well now and operating as we want so I don't think we'll pursue a support request. Regards, SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments here to may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Thu May 4 06:27:11 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Thu, 04 May 2017 05:27:11 +0000 Subject: [gpfsug-discuss] Tiebreaker disk question In-Reply-To: <1493831316163.52984@convergeone.com> References: <1493831316163.52984@convergeone.com> Message-ID: This doesn't sound like normal behaviour. It shouldn't matter which filesystem your tiebreaker disks belong to. I think the failure was caused by something else, but am not able to guess from the little information you posted.. The mmfs.log will probably tell you the reason. -jf ons. 3. mai 2017 kl. 19.08 skrev Shaun Anderson : > We noticed some odd behavior recently. I have a customer with a small > Scale (with Archive on top) configuration that we recently updated to a > dual node configuration. We are using CES and setup a very small 3 > nsd shared-root filesystem(gpfssr). We also set up tiebreaker disks and > figured it would be ok to use the gpfssr NSDs for this purpose. > > > When we tried to perform some basic failover testing, both nodes came > down. It appears from the logs that when we initiated the node failure > (via mmshutdown command...not great, I know) it unmounts and remounts the > shared-root filesystem. When it did this, the cluster lost access to the > tiebreaker disks, figured it had lost quorum and the other node came down > as well. > > > We got around this by changing the tiebreaker disks to our other normal > gpfs filesystem. After that failover worked as expected. This is > documented nowhere as far as I could find?. I wanted to know if anybody > else had experienced this and if this is expected behavior. All is well > now and operating as we want so I don't think we'll pursue a support > request. > > > Regards, > > *SHAUN ANDERSON* > STORAGE ARCHITECT > O 208.577.2112 > M 214.263.7014 > > > NOTICE: This email message and any attachments here to may contain > confidential > information. Any unauthorized review, use, disclosure, or distribution of > such > information is prohibited. If you are not the intended recipient, please > contact > the sender by reply email and destroy the original message and all copies > of it. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu May 4 08:56:09 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 4 May 2017 09:56:09 +0200 Subject: [gpfsug-discuss] Tiebreaker disk question In-Reply-To: References: <1493831316163.52984@convergeone.com> Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu May 4 14:15:40 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 4 May 2017 13:15:40 +0000 Subject: [gpfsug-discuss] HAWC question Message-ID: Hi, I have a question about HAWC, we are trying to enable this for our OpenStack environment, system pool is on SSD already, so we try to change the log file size with: mmchfs FSNAME -L 128M This says: mmchfs: Attention: You must restart the GPFS daemons before the new log file size takes effect. The GPFS daemons can be restarted one node at a time. When the GPFS daemon is restarted on the last node in the cluster, the new log size becomes effective. We multi-cluster the file-system, so do we have to restart every node in all clusters, or just in the storage cluster? And how do we tell once it has become active? Thanks Simon From kenneth.waegeman at ugent.be Thu May 4 14:22:25 2017 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Thu, 4 May 2017 15:22:25 +0200 Subject: [gpfsug-discuss] bizarre performance behavior In-Reply-To: References: <2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov> <4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be> <67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov> <9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be> <7f7349c9-bdd3-5847-1cca-d98d221489fe@ugent.be> Message-ID: Hi, We found out using ib_read_bw and ib_write_bw that there were some links between server and clients degraded, having a bandwith of 350MB/s strangely, nsdperf did not report the same. It reported 12GB/s write and 9GB/s read, which was much more then we actually could achieve. So the problem was some bad ib routing. We changed some ib links, and then we got also 12GB/s read with nsdperf. On our clients we then are able to achieve the 7,2GB/s in total we also saw using the nsd servers! Many thanks for the help !! We are now running some tests with different blocksizes and parameters, because our backend storage is able to do more than the 7.2GB/s we get with GPFS (more like 14GB/s in total). I guess prefetchthreads and nsdworkerthreads are the ones to look at? Cheers! Kenneth On 21/04/17 22:27, Kumaran Rajaram wrote: > Hi Kenneth, > > As it was mentioned earlier, it will be good to first verify the raw > network performance between the NSD client and NSD server using the > nsdperf tool that is built with RDMA support. > g++ -O2 -DRDMA -o nsdperf -lpthread -lrt -libverbs -lrdmacm nsdperf.C > > In addition, since you have 2 x NSD servers it will be good to perform > NSD client file-system performance test with just single NSD server > (mmshutdown the other server, assuming all the NSDs have primary, > server NSD server configured + Quorum will be intact when a NSD server > is brought down) to see if it helps to improve the read performance + > if there are variations in the file-system read bandwidth results > between NSD_server#1 'active' vs. NSD_server #2 'active' (with other > NSD server in GPFS "down" state). If there is significant variation, > it can help to isolate the issue to particular NSD server (HW or IB > issue?). > > You can issue "mmdiag --waiters" on NSD client as well as NSD servers > during your dd test, to verify if there are unsual long GPFS waiters. > In addition, you may issue Linux "perf top -z" command on the GPFS > node to see if there is high CPU usage by any particular call/event > (for e.g., If GPFS config parameter verbsRdmaMaxSendBytes has been > set to low value from the default 16M, then it can cause RDMA > completion threads to go CPU bound ). Please verify some performance > scenarios detailed in Chapter 22 in Spectrum Scale Problem > Determination Guide (link below). > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/pdf/scale_pdg.pdf?view=kc > > Thanks, > -Kums > > > > > > From: Kenneth Waegeman > To: gpfsug main discussion list > Date: 04/21/2017 11:43 AM > Subject: Re: [gpfsug-discuss] bizarre performance behavior > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Hi, > > We already verified this on our nsds: > > [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --QpiSpeed > QpiSpeed=maxdatarate > [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --turbomode > turbomode=enable > [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg ?-SysProfile > SysProfile=perfoptimized > > so sadly this is not the issue. > > Also the output of the verbs commands look ok, there are connections > from the client to the nsds are there is data being read and writen. > > Thanks again! > > Kenneth > > > On 21/04/17 16:01, Kumaran Rajaram wrote: > Hi, > > Try enabling the following in the BIOS of the NSD servers (screen > shots below) > > * Turbo Mode - Enable > * QPI Link Frequency - Max Performance > * Operating Mode - Maximum Performance > * >>>>While we have even better performance with sequential reads on > raw storage LUNS, using GPFS we can only reach 1GB/s in total > (each nsd server seems limited by 0,5GB/s) independent of the > number of clients > > >>We are testing from 2 testing machines connected to the nsds > with infiniband, verbs enabled. > > > Also, It will be good to verify that all the GPFS nodes have Verbs > RDMA started using "mmfsadm test verbs status" and that the NSD > client-server communication from client to server during "dd" is > actually using Verbs RDMA using "mmfsadm test verbs conn" command (on > NSD client doing dd). If not, then GPFS might be using TCP/IP network > over which the cluster is configured impacting performance (If this is > the case, GPFS mmfs.log.latest for any Verbs RDMA related errors and > resolve). > > * > > > > > > > Regards, > -Kums > > > > > > > From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > __ > To: gpfsug main discussion list __ > > Date: 04/21/2017 09:11 AM > Subject: Re: [gpfsug-discuss] bizarre performance behavior > Sent by: _gpfsug-discuss-bounces at spectrumscale.org_ > > ------------------------------------------------------------------------ > > > > Fantastic news! It might also be worth running "cpupower monitor" or > "turbostat" on your NSD servers while you're running dd tests from the > clients to see what CPU frequency your cores are actually running at. > > A typical NSD server workload (especially with IB verbs and for reads) > can be pretty light on CPU which might not prompt your CPU crew > governor to up the frequency (which can affect throughout). If your > frequency scaling governor isn't kicking up the frequency of your CPUs > I've seen that cause this behavior in my testing. > > -Aaron > > > > > On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman > __ wrote: > > Hi, > > We are running a test setup with 2 NSD Servers backed by 4 Dell > Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of > the 4 powervaults, nsd02 is primary serving LUNS of controller B. > > We are testing from 2 testing machines connected to the nsds with > infiniband, verbs enabled. > > When we do dd from the NSD servers, we see indeed performance going to > 5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is > able to get the data at a decent speed. Since we can write from the > clients at a good speed, I didn't suspect the communication between > clients and nsds being the issue, especially since total performance > stays the same using 1 or multiple clients. > > I'll use the nsdperf tool to see if we can find anything, > > thanks! > > K > > On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE > CORP] wrote: > Interesting. Could you share a little more about your architecture? Is > it possible to mount the fs on an NSD server and do some dd's from the > fs on the NSD server? If that gives you decent performance perhaps try > NSDPERF next > _https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf_ > > > -Aaron > > > > > On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman > __ wrote: > > Hi, > > Having an issue that looks the same as this one: > > We can do sequential writes to the filesystem at 7,8 GB/s total , > which is the expected speed for our current storage > backend. While we have even better performance with sequential reads > on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each > nsd server seems limited by 0,5GB/s) independent of the number of clients > (1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, > MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed > in this thread, but nothing seems to impact this read performance. > > Any ideas? > > Thanks! > > Kenneth > > On 17/02/17 19:29, Jan-Frode Myklebust wrote: > I just had a similar experience from a sandisk infiniflash system > SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for > writes. and 250-300 Mbyte/s on sequential reads!! Random reads were on > the order of 2 Gbyte/s. > > After a bit head scratching snd fumbling around I found out that > reducing maxMBpS from 10000 to 100 fixed the problem! Digging further > I found that reducing prefetchThreads from default=72 to 32 also fixed > it, while leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s. > > Could something like this be the problem on your box as well? > > > > -jf > fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister > <_aaron.s.knister at nasa.gov_ >: > Well, I'm somewhat scrounging for hardware. This is in our test > environment :) And yep, it's got the 2U gpu-tray in it although even > without the riser it has 2 PCIe slots onboard (excluding the on-board > dual-port mezz card) so I think it would make a fine NSD server even > without the riser. > > -Aaron > > On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: > > Maybe its related to interrupt handlers somehow? You drive the load > up on one socket, you push all the interrupt handling to the other > socket where the fabric card is attached? > > > > Dunno ... (Though I am intrigued you use idataplex nodes as NSD > servers, I assume its some 2U gpu-tray riser one or something !) > > > > Simon > > ________________________________________ > > From: _gpfsug-discuss-bounces at spectrumscale.org_ > [_gpfsug-discuss-bounces at spectrumscale.org_ > ] on behalf of Aaron > Knister [_aaron.s.knister at nasa.gov_ ] > > Sent: 17 February 2017 15:52 > > To: gpfsug main discussion list > > Subject: [gpfsug-discuss] bizarre performance behavior > > > > This is a good one. I've got an NSD server with 4x 16GB fibre > > connections coming in and 1x FDR10 and 1x QDR connection going out to > > the clients. I was having a really hard time getting anything resembling > > sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for > > reads). The back-end is a DDN SFA12K and I *know* it can do better than > > that. > > > > I don't remember quite how I figured this out but simply by running > > "openssl speed -multi 16" on the nsd server to drive up the load I saw > > an almost 4x performance jump which is pretty much goes against every > > sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to > > quadruple your i/o performance"). > > > > This feels like some type of C-states frequency scaling shenanigans that > > I haven't quite ironed down yet. I booted the box with the following > > kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which > > didn't seem to make much of a difference. I also tried setting the > > frequency governer to userspace and setting the minimum frequency to > > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have > > to run something to drive up the CPU load and then performance improves. > > > > I'm wondering if this could be an issue with the C1E state? I'm curious > > if anyone has seen anything like this. The node is a dx360 M4 > > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM. > > > > -Aaron > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at _spectrumscale.org_ > > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at _spectrumscale.org_ > > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at _spectrumscale.org_ _ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 61023 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 85131 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 84819 bytes Desc: not available URL: From oehmes at gmail.com Thu May 4 14:28:20 2017 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 04 May 2017 13:28:20 +0000 Subject: [gpfsug-discuss] HAWC question In-Reply-To: References: Message-ID: well, it's a bit complicated which is why the message is there in the first place. reason is, there is no easy way to tell except by dumping the stripgroup on the filesystem manager and check what log group your particular node is assigned to and then check the size of the log group. as soon as the client node gets restarted it should in most cases pick up a new log group and that should be at the new size, but to be 100% sure we say all nodes need to be restarted. you need to also turn HAWC on as well, i assume you just left this out of the email , just changing log size doesn't turn it on :-) On Thu, May 4, 2017 at 6:15 AM Simon Thompson (IT Research Support) < S.J.Thompson at bham.ac.uk> wrote: > Hi, > > I have a question about HAWC, we are trying to enable this for our > OpenStack environment, system pool is on SSD already, so we try to change > the log file size with: > > mmchfs FSNAME -L 128M > > This says: > > mmchfs: Attention: You must restart the GPFS daemons before the new log > file > size takes effect. The GPFS daemons can be restarted one node at a time. > When the GPFS daemon is restarted on the last node in the cluster, the new > log size becomes effective. > > > We multi-cluster the file-system, so do we have to restart every node in > all clusters, or just in the storage cluster? > > And how do we tell once it has become active? > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu May 4 14:39:33 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 4 May 2017 13:39:33 +0000 Subject: [gpfsug-discuss] HAWC question In-Reply-To: References: , Message-ID: Which cluster though? The client and storage are separate clusters, so all the nodes on the remote cluster or storage cluster? Thanks Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of oehmes at gmail.com [oehmes at gmail.com] Sent: 04 May 2017 14:28 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] HAWC question well, it's a bit complicated which is why the message is there in the first place. reason is, there is no easy way to tell except by dumping the stripgroup on the filesystem manager and check what log group your particular node is assigned to and then check the size of the log group. as soon as the client node gets restarted it should in most cases pick up a new log group and that should be at the new size, but to be 100% sure we say all nodes need to be restarted. you need to also turn HAWC on as well, i assume you just left this out of the email , just changing log size doesn't turn it on :-) On Thu, May 4, 2017 at 6:15 AM Simon Thompson (IT Research Support) > wrote: Hi, I have a question about HAWC, we are trying to enable this for our OpenStack environment, system pool is on SSD already, so we try to change the log file size with: mmchfs FSNAME -L 128M This says: mmchfs: Attention: You must restart the GPFS daemons before the new log file size takes effect. The GPFS daemons can be restarted one node at a time. When the GPFS daemon is restarted on the last node in the cluster, the new log size becomes effective. We multi-cluster the file-system, so do we have to restart every node in all clusters, or just in the storage cluster? And how do we tell once it has become active? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Thu May 4 15:06:10 2017 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 04 May 2017 14:06:10 +0000 Subject: [gpfsug-discuss] HAWC question In-Reply-To: References: Message-ID: let me clarify and get back, i am not 100% sure on a cross cluster , i think the main point was that the FS manager for that fs should be reassigned (which could also happen via mmchmgr) and then the individual clients that mount that fs restarted , but i will double check and reply later . On Thu, May 4, 2017 at 6:39 AM Simon Thompson (IT Research Support) < S.J.Thompson at bham.ac.uk> wrote: > Which cluster though? The client and storage are separate clusters, so all > the nodes on the remote cluster or storage cluster? > > Thanks > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [ > gpfsug-discuss-bounces at spectrumscale.org] on behalf of oehmes at gmail.com [ > oehmes at gmail.com] > Sent: 04 May 2017 14:28 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] HAWC question > > well, it's a bit complicated which is why the message is there in the > first place. > > reason is, there is no easy way to tell except by dumping the stripgroup > on the filesystem manager and check what log group your particular node is > assigned to and then check the size of the log group. > > as soon as the client node gets restarted it should in most cases pick up > a new log group and that should be at the new size, but to be 100% sure we > say all nodes need to be restarted. > > you need to also turn HAWC on as well, i assume you just left this out of > the email , just changing log size doesn't turn it on :-) > > On Thu, May 4, 2017 at 6:15 AM Simon Thompson (IT Research Support) < > S.J.Thompson at bham.ac.uk> wrote: > Hi, > > I have a question about HAWC, we are trying to enable this for our > OpenStack environment, system pool is on SSD already, so we try to change > the log file size with: > > mmchfs FSNAME -L 128M > > This says: > > mmchfs: Attention: You must restart the GPFS daemons before the new log > file > size takes effect. The GPFS daemons can be restarted one node at a time. > When the GPFS daemon is restarted on the last node in the cluster, the new > log size becomes effective. > > > We multi-cluster the file-system, so do we have to restart every node in > all clusters, or just in the storage cluster? > > And how do we tell once it has become active? > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 16:24:41 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 15:24:41 +0000 Subject: [gpfsug-discuss] Well, this is the pits... Message-ID: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu May 4 16:34:34 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 4 May 2017 17:34:34 +0200 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 16:43:56 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 15:43:56 +0000 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> Message-ID: <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser > wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 4 16:45:53 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 4 May 2017 15:45:53 +0000 Subject: [gpfsug-discuss] Replace SSL cert in GUI - need guidance Message-ID: Hi all, I'm going through the steps outlines in the following article: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_managecertforgui.htm Will this work for 4.2.1 installations? Only because in step 5, "Generate a Java(tm) keystore file (.jks) by using the keytool. It is stored in the following directory:", the given directory - /opt/ibm/wlp/java/jre/bin - does not exist. Only the path upto and including wlp is on my GUI server. I can't imagine the instructions being so different between 4.2.1 and 4.2 but I've seen it happen.. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu May 4 16:54:50 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 4 May 2017 17:54:50 +0200 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 4 16:55:36 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 4 May 2017 15:55:36 +0000 Subject: [gpfsug-discuss] Replace SSL cert in GUI - need guidance In-Reply-To: References: Message-ID: Never mind - /usr/lpp/mmfs/java/jre/bin is where it's at. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 04 May 2017 16:46 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Replace SSL cert in GUI - need guidance Hi all, I'm going through the steps outlines in the following article: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_managecertforgui.htm Will this work for 4.2.1 installations? Only because in step 5, "Generate a Java(tm) keystore file (.jks) by using the keytool. It is stored in the following directory:", the given directory - /opt/ibm/wlp/java/jre/bin - does not exist. Only the path upto and including wlp is on my GUI server. I can't imagine the instructions being so different between 4.2.1 and 4.2 but I've seen it happen.. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 17:07:32 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 16:07:32 +0000 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> Message-ID: <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser > wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser > wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Thu May 4 17:11:53 2017 From: salut4tions at gmail.com (Jordan Robertson) Date: Thu, 4 May 2017 12:11:53 -0400 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: Kevin, The math currently used in the code appears to be "greater than 31 NSD's in the filesystem" combined with "greater than 31 pit worker threads", explicitly for a balancing restripe (we actually hit that combo on an older version of 3.5.x before the safety got written in there...it was a long day). At least, that's the apparent math used through 4.1.1.10, which we're currently running. If pitWorkerThreadsPerNode is set to 0 (default), GPFS should set the active thread number equal to the number of cores in the node, to a max of 16 threads I believe. Take in mind that for a restripe, it will also include the threads available on the fs manager. So if your fs manager and at least one helper node are both set to "0", and each contains at least 16 cores, the restripe "thread calculation" will exceed 31 threads so it won't run. We've had to tune our helper nodes to lower numbers (e.g a single helper node to 15 threads). Aaron please correct me if I'm braining that wrong anywhere. -Jordan On Thu, May 4, 2017 at 12:07 PM, Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu> wrote: > Hi Olaf, > > I didn?t touch pitWorkerThreadsPerNode ? it was already zero. > > I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or > 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes > this? With what I?m doing I need the ability to run mmrestripefs. > > It seems to me that mmrestripefs could check whether QOS is enabled ? > granted, it would have no way of knowing whether the values used actually > are reasonable or not ? but if QOS is enabled then ?trust? it to not > overrun the system. > > PMR time? Thanks.. > > Kevin > > On May 4, 2017, at 10:54 AM, Olaf Weiser wrote: > > HI Kevin, > the number of NSDs is more or less nonsense .. it is just the number of > nodes x PITWorker should not exceed to much the #mutex/FS block > did you adjust/tune the PitWorker ? ... > > so far as I know.. that the code checks the number of NSDs is already > considered as a defect and will be fixed / is already fixed ( I stepped > into it here as well) > > ps. QOS is the better approach to address this, but unfortunately.. not > everyone is using it by default... that's why I suspect , the development > decide to put in a check/limit here .. which in your case(with QOS) > would'nt needed > > > > > > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:44 PM > Subject: Re: [gpfsug-discuss] Well, this is the pits... > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Olaf, > > Your explanation mostly makes sense, but... > > Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. > And this filesystem only has 32 disks, which I would imagine is not an > especially large number compared to what some people reading this e-mail > have in their filesystems. > > I thought that QOS (which I?m using) was what would keep an mmrestripefs > from overrunning the system ? QOS has worked extremely well for us - it?s > one of my favorite additions to GPFS. > > Kevin > > On May 4, 2017, at 10:34 AM, Olaf Weiser <*olaf.weiser at de.ibm.com* > > wrote: > > no.. it is just in the code, because we have to avoid to run out of mutexs > / block > > reduce the number of nodes -N down to 4 (2nodes is even more safer) ... > is the easiest way to solve it for now.... > > I've been told the real root cause will be fixed in one of the next ptfs > .. within this year .. > this warning messages itself should appear every time.. but unfortunately > someone coded, that it depends on the number of disks (NSDs).. that's why I > suspect you did'nt see it before > but the fact , that we have to make sure, not to overrun the system by > mmrestripe remains.. to please lower the -N number of nodes to 4 or better > 2 > > (even though we know.. than the mmrestripe will take longer) > > > From: "Buterbaugh, Kevin L" <*Kevin.Buterbaugh at Vanderbilt.Edu* > > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 05/04/2017 05:26 PM > Subject: [gpfsug-discuss] Well, this is the pits... > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > Hi All, > > Another one of those, ?I can open a PMR if I need to? type questions? > > We are in the process of combining two large GPFS filesystems into one new > filesystem (for various reasons I won?t get into here). Therefore, I?m > doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. > > Yesterday I did an ?mmrestripefs -r -N ? (after > suspending a disk, of course). Worked like it should. > > Today I did a ?mmrestripefs -b -P capacity -N servers>? and got: > > mmrestripefs: The total number of PIT worker threads of all participating > nodes has been exceeded to safely restripe the file system. The total > number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode > of the participating nodes, cannot exceed 31. Reissue the command with a > smaller set of participating nodes (-N option) and/or lower the > pitWorkerThreadsPerNode configure setting. By default the file system > manager node is counted as a participating node. > mmrestripefs: Command failed. Examine previous error messages to determine > cause. > > So there must be some difference in how the ?-r? and ?-b? options > calculate the number of PIT worker threads. I did an ?mmfsadm dump all | > grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem > manager node ? they all say the same thing: > > pitWorkerThreadsPerNode 0 > > Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > *Kevin.Buterbaugh at vanderbilt.edu* - > (615)875-9633 <(615)%20875-9633> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Thu May 4 17:49:20 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Thu, 4 May 2017 12:49:20 -0400 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu><982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: Hi, >>I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. GPFS version 4.2.3.0 (and above) fixes this issue and supports "sum of pitWorkerThreadsPerNode of the participating nodes (-N parameter to mmrestripefs)" to exceed 31. If you are using 4.2.2.3, then depending on "number of nodes participating in the mmrestripefs" then the GPFS config parameter "pitWorkerThreadsPerNode" need to be adjusted such that "sum of pitWorkerThreadsPerNode of the participating nodes <= 31". For example, if "number of nodes participating in the mmrestripefs" is 6 then adjust "mmchconfig pitWorkerThreadsPerNode=5 -N ". GPFS would need to be restarted for this parameter to take effect on the participating_nodes (verify with mmfsadm dump config | grep pitWorkerThreadsPerNode) Regards, -Kums From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 12:08 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 17:56:26 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 16:56:26 +0000 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: Hi Kums, Thanks for the info on the releases ? can you clarify about pitWorkerThreadsPerNode? As I said in my original post, on all 8 NSD servers and the filesystem manager it is set to zero. No matter how many times I add zero to zero I don?t get a value > 31! ;-) So I take it that zero has some sort of unspecified significance? Thanks? Kevin On May 4, 2017, at 11:49 AM, Kumaran Rajaram > wrote: Hi, >>I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. GPFS version 4.2.3.0 (and above) fixes this issue and supports "sum of pitWorkerThreadsPerNode of the participating nodes (-N parameter to mmrestripefs)" to exceed 31. If you are using 4.2.2.3, then depending on "number of nodes participating in the mmrestripefs" then the GPFS config parameter "pitWorkerThreadsPerNode" need to be adjusted such that "sum of pitWorkerThreadsPerNode of the participating nodes <= 31". For example, if "number of nodes participating in the mmrestripefs" is 6 then adjust "mmchconfig pitWorkerThreadsPerNode=5 -N ". GPFS would need to be restarted for this parameter to take effect on the participating_nodes (verify with mmfsadm dump config | grep pitWorkerThreadsPerNode) Regards, -Kums From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 12:08 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser > wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser > wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 18:15:16 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 17:15:16 +0000 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: <8E68031C-8362-468B-873F-2B3D3B2A15B7@vanderbilt.edu> Hi Stephen, My apologies - Jordan?s response had been snagged by the University's SPAM filter (I went and checked and found it after receiving your e-mail)? Kevin On May 4, 2017, at 12:04 PM, Stephen Ulmer > wrote: Look at Jordan?s answer, he explains what significance 0 has. In short, GPFS will use one thread per core per server, so they could add to 31 quickly. ;) -- Stephen On May 4, 2017, at 12:56 PM, Buterbaugh, Kevin L > wrote: Hi Kums, Thanks for the info on the releases ? can you clarify about pitWorkerThreadsPerNode? As I said in my original post, on all 8 NSD servers and the filesystem manager it is set to zero. No matter how many times I add zero to zero I don?t get a value > 31! ;-) So I take it that zero has some sort of unspecified significance? Thanks? Kevin On May 4, 2017, at 11:49 AM, Kumaran Rajaram > wrote: Hi, >>I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. GPFS version 4.2.3.0 (and above) fixes this issue and supports "sum of pitWorkerThreadsPerNode of the participating nodes (-N parameter to mmrestripefs)" to exceed 31. If you are using 4.2.2.3, then depending on "number of nodes participating in the mmrestripefs" then the GPFS config parameter "pitWorkerThreadsPerNode" need to be adjusted such that "sum of pitWorkerThreadsPerNode of the participating nodes <= 31". For example, if "number of nodes participating in the mmrestripefs" is 6 then adjust "mmchconfig pitWorkerThreadsPerNode=5 -N ". GPFS would need to be restarted for this parameter to take effect on the participating_nodes (verify with mmfsadm dump config | grep pitWorkerThreadsPerNode) Regards, -Kums From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 12:08 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser > wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser > wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Thu May 4 18:20:41 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Thu, 4 May 2017 13:20:41 -0400 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu><982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu><27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: >>Thanks for the info on the releases ? can you clarify about pitWorkerThreadsPerNode? pitWorkerThreadsPerNode -- Specifies how many threads do restripe, data movement, etc >>As I said in my original post, on all 8 NSD servers and the filesystem manager it is set to zero. No matter how many times I add zero to zero I don?t get a value > 31! ;-) So I take it that zero has some sort of unspecified significance? Thanks? Value of 0 just indicates pitWorkerThreadsPerNode takes internal_value based on GPFS setup and file-system configuration (which can be 16 or lower) based on the following formula. Default is pitWorkerThreadsPerNode = MIN(16, (numberOfDisks_in_filesystem * 4) / numberOfParticipatingNodes_in_mmrestripefs + 1) For example, if you have 64 x NSDs in your file-system and you are using 8 NSD servers in "mmrestripefs -N", then pitWorkerThreadsPerNode = MIN (16, (256/8)+1) resulting in pitWorkerThreadsPerNode to take value of 16 ( default 0 will result in 16 threads doing restripe per mmrestripefs participating Node). If you want 8 NSD servers (running 4.2.2.3) to participate in mmrestripefs operation then set "mmchconfig pitWorkerThreadsPerNode=3 -N <8_NSD_Servers>" such that (8 x 3) is less than 31. Regards, -Kums From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 12:57 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Kums, Thanks for the info on the releases ? can you clarify about pitWorkerThreadsPerNode? As I said in my original post, on all 8 NSD servers and the filesystem manager it is set to zero. No matter how many times I add zero to zero I don?t get a value > 31! ;-) So I take it that zero has some sort of unspecified significance? Thanks? Kevin On May 4, 2017, at 11:49 AM, Kumaran Rajaram wrote: Hi, >>I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. GPFS version 4.2.3.0 (and above) fixes this issue and supports "sum of pitWorkerThreadsPerNode of the participating nodes (-N parameter to mmrestripefs)" to exceed 31. If you are using 4.2.2.3, then depending on "number of nodes participating in the mmrestripefs" then the GPFS config parameter "pitWorkerThreadsPerNode" need to be adjusted such that "sum of pitWorkerThreadsPerNode of the participating nodes <= 31". For example, if "number of nodes participating in the mmrestripefs" is 6 then adjust "mmchconfig pitWorkerThreadsPerNode=5 -N ". GPFS would need to be restarted for this parameter to take effect on the participating_nodes (verify with mmfsadm dump config | grep pitWorkerThreadsPerNode) Regards, -Kums From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 12:08 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Thu May 4 23:22:12 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Thu, 4 May 2017 22:22:12 +0000 Subject: [gpfsug-discuss] bizarre performance behavior In-Reply-To: References: <2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov><4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be><67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov><9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be><7f7349c9-bdd3-5847-1cca-d98d221489fe@ugent.be> Message-ID: Hi, >>So the problem was some bad ib routing. We changed some ib links, and then we got also 12GB/s read with nsdperf. >>On our clients we then are able to achieve the 7,2GB/s in total we also saw using the nsd servers! This is good to hear. >> We are now running some tests with different blocksizes and parameters, because our backend storage is able to do more than the 7.2GB/s we get with GPFS (more like 14GB/s in total). I guess prefetchthreads and nsdworkerthreads are the ones to look at? If you are on 4.2.0.3 or higher, you can use workerThreads config paramter (start with value of 128, and increase in increments of 128 until MAX supported) and this setting will auto adjust values for other parameters such as prefetchThreads, worker3Threads etc. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Tuning%20Parameters In addition to trying larger file-system block-size (e.g. 4MiB or higher such that is aligns with storage volume RAID-stripe-width) and config parameters (e.g , workerThreads, ignorePrefetchLUNCount), it will be good to assess the "backend storage" performance for random I/O access pattern (with block I/O sizes in units of FS block-size) as this is more likely I/O scenario that the backend storage will experience when you have many GPFS nodes performing I/O simultaneously to the file-system (in production environment). mmcrfs has option "[-j {cluster | scatter}]". "-j scatter" would be recommended for consistent file-system performance over the lifetime of the file-system but then "-j scatter" will result in random I/O to backend storage (even though application is performing sequential I/O). For your test purposes, you may assess the GPFS file-system performance by mmcrfs with "-j cluster" and you may see good sequential results (compared to -j scatter) for lower client counts but as you scale the client counts the combined workload can result in "-j scatter" to backend storage (limiting the FS performance to random I/O performance of the backend storage). [snip from mmcrfs] layoutMap={scatter | cluster} Specifies the block allocation map type. When allocating blocks for a given file, GPFS first uses a round?robin algorithm to spread the data across all disks in the storage pool. After a disk is selected, the location of the data block on the disk is determined by the block allocation map type. If cluster is specified, GPFS attempts to allocate blocks in clusters. Blocks that belong to a particular file are kept adjacent to each other within each cluster. If scatter is specified, the location of the block is chosen randomly. The cluster allocation method may provide better disk performance for some disk subsystems in relatively small installations. The benefits of clustered block allocation diminish when the number of nodes in the cluster or the number of disks in a file system increases, or when the file system?s free space becomes fragmented. The cluster allocation method is the default for GPFS clusters with eight or fewer nodes and for file systems with eight or fewer disks. The scatter allocation method provides more consistent file system performance by averaging out performance variations due to block location (for many disk subsystems, the location of the data relative to the disk edge has a substantial effect on performance). This allocation method is appropriate in most cases and is the default for GPFS clusters with more than eight nodes or file systems with more than eight disks. The block allocation map type cannot be changed after the storage pool has been created. .. .. -j {cluster | scatter} Specifies the default block allocation map type to be used if layoutMap is not specified for a given storage pool. [/snip from mmcrfs] My two cents, -Kums From: Kenneth Waegeman To: gpfsug main discussion list Date: 05/04/2017 09:23 AM Subject: Re: [gpfsug-discuss] bizarre performance behavior Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We found out using ib_read_bw and ib_write_bw that there were some links between server and clients degraded, having a bandwith of 350MB/s strangely, nsdperf did not report the same. It reported 12GB/s write and 9GB/s read, which was much more then we actually could achieve. So the problem was some bad ib routing. We changed some ib links, and then we got also 12GB/s read with nsdperf. On our clients we then are able to achieve the 7,2GB/s in total we also saw using the nsd servers! Many thanks for the help !! We are now running some tests with different blocksizes and parameters, because our backend storage is able to do more than the 7.2GB/s we get with GPFS (more like 14GB/s in total). I guess prefetchthreads and nsdworkerthreads are the ones to look at? Cheers! Kenneth On 21/04/17 22:27, Kumaran Rajaram wrote: Hi Kenneth, As it was mentioned earlier, it will be good to first verify the raw network performance between the NSD client and NSD server using the nsdperf tool that is built with RDMA support. g++ -O2 -DRDMA -o nsdperf -lpthread -lrt -libverbs -lrdmacm nsdperf.C In addition, since you have 2 x NSD servers it will be good to perform NSD client file-system performance test with just single NSD server (mmshutdown the other server, assuming all the NSDs have primary, server NSD server configured + Quorum will be intact when a NSD server is brought down) to see if it helps to improve the read performance + if there are variations in the file-system read bandwidth results between NSD_server#1 'active' vs. NSD_server #2 'active' (with other NSD server in GPFS "down" state). If there is significant variation, it can help to isolate the issue to particular NSD server (HW or IB issue?). You can issue "mmdiag --waiters" on NSD client as well as NSD servers during your dd test, to verify if there are unsual long GPFS waiters. In addition, you may issue Linux "perf top -z" command on the GPFS node to see if there is high CPU usage by any particular call/event (for e.g., If GPFS config parameter verbsRdmaMaxSendBytes has been set to low value from the default 16M, then it can cause RDMA completion threads to go CPU bound ). Please verify some performance scenarios detailed in Chapter 22 in Spectrum Scale Problem Determination Guide (link below). https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/pdf/scale_pdg.pdf?view=kc Thanks, -Kums From: Kenneth Waegeman To: gpfsug main discussion list Date: 04/21/2017 11:43 AM Subject: Re: [gpfsug-discuss] bizarre performance behavior Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We already verified this on our nsds: [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --QpiSpeed QpiSpeed=maxdatarate [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --turbomode turbomode=enable [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg ?-SysProfile SysProfile=perfoptimized so sadly this is not the issue. Also the output of the verbs commands look ok, there are connections from the client to the nsds are there is data being read and writen. Thanks again! Kenneth On 21/04/17 16:01, Kumaran Rajaram wrote: Hi, Try enabling the following in the BIOS of the NSD servers (screen shots below) Turbo Mode - Enable QPI Link Frequency - Max Performance Operating Mode - Maximum Performance >>>>While we have even better performance with sequential reads on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server seems limited by 0,5GB/s) independent of the number of clients >>We are testing from 2 testing machines connected to the nsds with infiniband, verbs enabled. Also, It will be good to verify that all the GPFS nodes have Verbs RDMA started using "mmfsadm test verbs status" and that the NSD client-server communication from client to server during "dd" is actually using Verbs RDMA using "mmfsadm test verbs conn" command (on NSD client doing dd). If not, then GPFS might be using TCP/IP network over which the cluster is configured impacting performance (If this is the case, GPFS mmfs.log.latest for any Verbs RDMA related errors and resolve). Regards, -Kums From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" To: gpfsug main discussion list Date: 04/21/2017 09:11 AM Subject: Re: [gpfsug-discuss] bizarre performance behavior Sent by: gpfsug-discuss-bounces at spectrumscale.org Fantastic news! It might also be worth running "cpupower monitor" or "turbostat" on your NSD servers while you're running dd tests from the clients to see what CPU frequency your cores are actually running at. A typical NSD server workload (especially with IB verbs and for reads) can be pretty light on CPU which might not prompt your CPU crew governor to up the frequency (which can affect throughout). If your frequency scaling governor isn't kicking up the frequency of your CPUs I've seen that cause this behavior in my testing. -Aaron On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman wrote: Hi, We are running a test setup with 2 NSD Servers backed by 4 Dell Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of the 4 powervaults, nsd02 is primary serving LUNS of controller B. We are testing from 2 testing machines connected to the nsds with infiniband, verbs enabled. When we do dd from the NSD servers, we see indeed performance going to 5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is able to get the data at a decent speed. Since we can write from the clients at a good speed, I didn't suspect the communication between clients and nsds being the issue, especially since total performance stays the same using 1 or multiple clients. I'll use the nsdperf tool to see if we can find anything, thanks! K On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: Interesting. Could you share a little more about your architecture? Is it possible to mount the fs on an NSD server and do some dd's from the fs on the NSD server? If that gives you decent performance perhaps try NSDPERF next https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf -Aaron On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman wrote: Hi, Having an issue that looks the same as this one: We can do sequential writes to the filesystem at 7,8 GB/s total , which is the expected speed for our current storage backend. While we have even better performance with sequential reads on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server seems limited by 0,5GB/s) independent of the number of clients (1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in this thread, but nothing seems to impact this read performance. Any ideas? Thanks! Kenneth On 17/02/17 19:29, Jan-Frode Myklebust wrote: I just had a similar experience from a sandisk infiniflash system SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes. and 250-300 Mbyte/s on sequential reads!! Random reads were on the order of 2 Gbyte/s. After a bit head scratching snd fumbling around I found out that reducing maxMBpS from 10000 to 100 fixed the problem! Digging further I found that reducing prefetchThreads from default=72 to 32 also fixed it, while leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s. Could something like this be the problem on your box as well? -jf fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister : Well, I'm somewhat scrounging for hardware. This is in our test environment :) And yep, it's got the 2U gpu-tray in it although even without the riser it has 2 PCIe slots onboard (excluding the on-board dual-port mezz card) so I think it would make a fine NSD server even without the riser. -Aaron On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services) wrote: > Maybe its related to interrupt handlers somehow? You drive the load up on one socket, you push all the interrupt handling to the other socket where the fabric card is attached? > > Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers, I assume its some 2U gpu-tray riser one or something !) > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org[ gpfsug-discuss-bounces at spectrumscale.org] on behalf of Aaron Knister [ aaron.s.knister at nasa.gov] > Sent: 17 February 2017 15:52 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] bizarre performance behavior > > This is a good one. I've got an NSD server with 4x 16GB fibre > connections coming in and 1x FDR10 and 1x QDR connection going out to > the clients. I was having a really hard time getting anything resembling > sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for > reads). The back-end is a DDN SFA12K and I *know* it can do better than > that. > > I don't remember quite how I figured this out but simply by running > "openssl speed -multi 16" on the nsd server to drive up the load I saw > an almost 4x performance jump which is pretty much goes against every > sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to > quadruple your i/o performance"). > > This feels like some type of C-states frequency scaling shenanigans that > I haven't quite ironed down yet. I booted the box with the following > kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which > didn't seem to make much of a difference. I also tried setting the > frequency governer to userspace and setting the minimum frequency to > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have > to run something to drive up the CPU load and then performance improves. > > I'm wondering if this could be an issue with the C1E state? I'm curious > if anyone has seen anything like this. The node is a dx360 M4 > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 61023 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 85131 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 84819 bytes Desc: not available URL: From ckrafft at de.ibm.com Fri May 5 18:13:18 2017 From: ckrafft at de.ibm.com (Christoph Krafft) Date: Fri, 5 May 2017 19:13:18 +0200 Subject: [gpfsug-discuss] Node Failure with SCSI-3 Persistant Reserve Message-ID: Hello folks, has anyone made "posotive" experiences with SCSI-3 Pers. Reserve? Is this "method" still valid for Linux? Thank you for any hints and tips! Mit freundlichen Gr??en / Sincerely Christoph Krafft Client Technical Specialist - Power Systems, IBM Systems Certified IT Specialist @ The Open Group Phone: +49 (0) 7034 643 2171 IBM Deutschland GmbH Mobile: +49 (0) 160 97 81 86 12 Am Weiher 24 Email: ckrafft at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Nicole Reimer, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Stefan Lutz Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19235477.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From scale at us.ibm.com Fri May 5 20:18:12 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 5 May 2017 15:18:12 -0400 Subject: [gpfsug-discuss] Node Failure with SCSI-3 Persistant Reserve In-Reply-To: References: Message-ID: SCSI-3 persistent reserve is still supported as documented in the FAQ. I personally do not have any experience using it. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Christoph Krafft" To: "gpfsug main discussion list" Date: 05/05/2017 01:14 PM Subject: [gpfsug-discuss] Node Failure with SCSI-3 Persistant Reserve Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello folks, has anyone made "posotive" experiences with SCSI-3 Pers. Reserve? Is this "method" still valid for Linux? Thank you for any hints and tips! Mit freundlichen Gr??en / Sincerely Christoph Krafft Client Technical Specialist - Power Systems, IBM Systems Certified IT Specialist @ The Open Group Phone: +49 (0) 7034 643 2171 IBM Deutschland GmbH Mobile: +49 (0) 160 97 81 86 12 Am Weiher 24 Email: ckrafft at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Nicole Reimer, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Stefan Lutz Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Mon May 8 17:06:22 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 12:06:22 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable Message-ID: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> We have a setup in which "cluster 0" is made up of clients only on gpfs v3.5, ie, no NDS's or formal storage on this primary membership. All storage for those clients come in a multi-cluster fashion, from clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). We recently added a new storage cluster 4 (4.1.1-14), and for some obscure reason we keep getting "Network is unreachable" during mount by clients, even though there were no issues or errors with the multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' worked fine, and all clients have an entry in /etc/fstab for the file system associated with the new cluster 4. The weird thing is that we can mount cluster 3 fine (also 4.1). Another piece og information is that as far as GPFS goes all clusters are configured to communicate exclusively over Infiniband, each on a different 10.20.x.x network, but broadcast 10.20.255.255. As far as the IB network goes there are no problems routing/pinging around all the clusters. So this must be internal to GPFS. None of the clusters have the subnet parameter set explicitly at configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem we need to. All have cipherList AUTHONLY. One difference is that cluster 4 has DMAPI enabled (don't think it matters). Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients during mount (10.20.179.1 is one of the NDS on cluster 4): Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node 10.20.179.1 failed with error 447 (client side). Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount wosgpfs.wos-gateway01-ib0:wosgpfs Mon May 8 11:35:28.783 2017: Network is unreachable I see this reference to "TLS handshake" and error 447, however according to the manual this TLS is only set to be default on 4.2 onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. mmdiag --network for some of the client gives this excerpt (broken status): tapenode-ib0 10.20.83.5 broken 233 -1 0 0 Linux/L gpc-f114n014-ib0 10.20.114.14 broken 233 -1 0 0 Linux/L gpc-f114n015-ib0 10.20.114.15 broken 233 -1 0 0 Linux/L gpc-f114n016-ib0 10.20.114.16 broken 233 -1 0 0 Linux/L wos-gateway01-ib0 10.20.179.1 broken 233 -1 0 0 Linux/L I guess I just need a hint on how to troubleshoot this situation (the 4.1 troubleshoot guide is not helping). Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From S.J.Thompson at bham.ac.uk Mon May 8 17:12:35 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 8 May 2017 16:12:35 +0000 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: Do you have multiple networks on the hosts? We've seen this sort of thing when rp_filter is dropping traffic with asynchronous routing. I know you said it's set to only go over IB, but if you have names that resolve onto you Ethernet, and admin name etc are not correct, it might be your problem. If you had 4.2, I'd suggest mmnetverify. I suppose that might work if you copied it out of the 4.x packages anyway? Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of pinto at scinet.utoronto.ca [pinto at scinet.utoronto.ca] Sent: 08 May 2017 17:06 To: gpfsug main discussion list Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable We have a setup in which "cluster 0" is made up of clients only on gpfs v3.5, ie, no NDS's or formal storage on this primary membership. All storage for those clients come in a multi-cluster fashion, from clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). We recently added a new storage cluster 4 (4.1.1-14), and for some obscure reason we keep getting "Network is unreachable" during mount by clients, even though there were no issues or errors with the multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' worked fine, and all clients have an entry in /etc/fstab for the file system associated with the new cluster 4. The weird thing is that we can mount cluster 3 fine (also 4.1). Another piece og information is that as far as GPFS goes all clusters are configured to communicate exclusively over Infiniband, each on a different 10.20.x.x network, but broadcast 10.20.255.255. As far as the IB network goes there are no problems routing/pinging around all the clusters. So this must be internal to GPFS. None of the clusters have the subnet parameter set explicitly at configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem we need to. All have cipherList AUTHONLY. One difference is that cluster 4 has DMAPI enabled (don't think it matters). Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients during mount (10.20.179.1 is one of the NDS on cluster 4): Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node 10.20.179.1 failed with error 447 (client side). Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount wosgpfs.wos-gateway01-ib0:wosgpfs Mon May 8 11:35:28.783 2017: Network is unreachable I see this reference to "TLS handshake" and error 447, however according to the manual this TLS is only set to be default on 4.2 onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. mmdiag --network for some of the client gives this excerpt (broken status): tapenode-ib0 10.20.83.5 broken 233 -1 0 0 Linux/L gpc-f114n014-ib0 10.20.114.14 broken 233 -1 0 0 Linux/L gpc-f114n015-ib0 10.20.114.15 broken 233 -1 0 0 Linux/L gpc-f114n016-ib0 10.20.114.16 broken 233 -1 0 0 Linux/L wos-gateway01-ib0 10.20.179.1 broken 233 -1 0 0 Linux/L I guess I just need a hint on how to troubleshoot this situation (the 4.1 troubleshoot guide is not helping). Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pinto at scinet.utoronto.ca Mon May 8 17:23:01 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 12:23:01 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <20170508122301.25824jjpcvgd20dh@support.scinet.utoronto.ca> Quoting "Simon Thompson (IT Research Support)" : > Do you have multiple networks on the hosts? We've seen this sort of > thing when rp_filter is dropping traffic with asynchronous routing. > Yes Simon, All clients and servers have multiple interfaces on different networks, but we've been careful to always join nodes with the -ib0 resolution, always on IB. I can also query with 'mmlscluster' and all nodes involved are listed with the 10.20.x.x IP and -ib0 extension on their names. We don't have mmnetverify anywhere yet. Thanks Jaime > I know you said it's set to only go over IB, but if you have names > that resolve onto you Ethernet, and admin name etc are not correct, > it might be your problem. > > If you had 4.2, I'd suggest mmnetverify. I suppose that might work > if you copied it out of the 4.x packages anyway? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of > pinto at scinet.utoronto.ca [pinto at scinet.utoronto.ca] > Sent: 08 May 2017 17:06 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] help with multi-cluster setup: Network is > unreachable > > We have a setup in which "cluster 0" is made up of clients only on > gpfs v3.5, ie, no NDS's or formal storage on this primary membership. > > All storage for those clients come in a multi-cluster fashion, from > clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). > > We recently added a new storage cluster 4 (4.1.1-14), and for some > obscure reason we keep getting "Network is unreachable" during mount > by clients, even though there were no issues or errors with the > multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' > worked fine, and all clients have an entry in /etc/fstab for the file > system associated with the new cluster 4. The weird thing is that we > can mount cluster 3 fine (also 4.1). > > Another piece og information is that as far as GPFS goes all clusters > are configured to communicate exclusively over Infiniband, each on a > different 10.20.x.x network, but broadcast 10.20.255.255. As far as > the IB network goes there are no problems routing/pinging around all > the clusters. So this must be internal to GPFS. > > None of the clusters have the subnet parameter set explicitly at > configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem > we need to. All have cipherList AUTHONLY. One difference is that > cluster 4 has DMAPI enabled (don't think it matters). > > Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients > during mount (10.20.179.1 is one of the NDS on cluster 4): > Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster > wosgpfs.wos-gateway01-ib0 > Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node > 10.20.179.1 failed with error 447 (client side). > Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster > wosgpfs.wos-gateway01-ib0 > Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount > wosgpfs.wos-gateway01-ib0:wosgpfs > Mon May 8 11:35:28.783 2017: Network is unreachable > > > I see this reference to "TLS handshake" and error 447, however > according to the manual this TLS is only set to be default on 4.2 > onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. > > mmdiag --network for some of the client gives this excerpt (broken status): > tapenode-ib0 10.20.83.5 > broken 233 -1 0 0 Linux/L > gpc-f114n014-ib0 10.20.114.14 > broken 233 -1 0 0 Linux/L > gpc-f114n015-ib0 10.20.114.15 > broken 233 -1 0 0 Linux/L > gpc-f114n016-ib0 10.20.114.16 > broken 233 -1 0 0 Linux/L > wos-gateway01-ib0 10.20.179.1 > broken 233 -1 0 0 Linux/L > > > > I guess I just need a hint on how to troubleshoot this situation (the > 4.1 troubleshoot guide is not helping). > > Thanks > Jaime > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From eric.wonderley at vt.edu Mon May 8 17:34:44 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Mon, 8 May 2017 12:34:44 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: Hi Jamie: I think typically you want to keep the clients ahead of the server in version. I would advance the version of you client nodes. New clients can communicate with older versions of server nsds. Vice versa...no so much. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon May 8 17:49:52 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 8 May 2017 16:49:52 +0000 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <54F54813-F40D-4F55-B774-272813758EC8@vanderbilt.edu> Hi Eric, Jamie, Interesting comment as we do exactly the opposite! I always make sure that my servers are running a particular version before I upgrade any clients. Now we never mix and match major versions (i.e. 4.x and 3.x) for long ? those kinds of upgrades we do rapidly. But right now I?ve got clients running 4.2.0-3 talking just fine to 4.2.2.3 servers. To be clear, I?m not saying I?m right and Eric?s wrong at all - just an observation / data point. YMMV? Kevin On May 8, 2017, at 11:34 AM, J. Eric Wonderley > wrote: Hi Jamie: I think typically you want to keep the clients ahead of the server in version. I would advance the version of you client nodes. New clients can communicate with older versions of server nsds. Vice versa...no so much. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon May 8 18:04:22 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 13:04:22 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <20170508130422.11171a2pqcx35p1y@support.scinet.utoronto.ca> Sorry, I made a mistake on the original description: all our clients are already on 4.1.1-7. Jaime Quoting "J. Eric Wonderley" : > Hi Jamie: > > I think typically you want to keep the clients ahead of the server in > version. I would advance the version of you client nodes. > > New clients can communicate with older versions of server nsds. Vice > versa...no so much. > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From mweil at wustl.edu Mon May 8 18:07:03 2017 From: mweil at wustl.edu (Matt Weil) Date: Mon, 8 May 2017 12:07:03 -0500 Subject: [gpfsug-discuss] socketMaxListenConnections and net.core.somaxconn Message-ID: <39b63a8b-2ae7-f9a0-c1c4-319f84fa5354@wustl.edu> Hello all, what happens if we set socketMaxListenConnections to a larger number than we have clients? more memory used? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From pinto at scinet.utoronto.ca Mon May 8 18:12:38 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 13:12:38 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <54F54813-F40D-4F55-B774-272813758EC8@vanderbilt.edu> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> <54F54813-F40D-4F55-B774-272813758EC8@vanderbilt.edu> Message-ID: <20170508131238.632312ooano92cxy@support.scinet.utoronto.ca> I only ask that we look beyond the trivial. The existing multi-cluster setup with mixed versions of servers already work fine with 4000+ clients on 4.1. We still have 3 legacy servers on 3.5, we already have a server on 4.1 also serving fine. The brand new 4.1 server we added last week seems to be at odds for some reason, not that obvious. Thanks Jaime Quoting "Buterbaugh, Kevin L" : > Hi Eric, Jamie, > > Interesting comment as we do exactly the opposite! > > I always make sure that my servers are running a particular version > before I upgrade any clients. Now we never mix and match major > versions (i.e. 4.x and 3.x) for long ? those kinds of upgrades we do > rapidly. But right now I?ve got clients running 4.2.0-3 talking > just fine to 4.2.2.3 servers. > > To be clear, I?m not saying I?m right and Eric?s wrong at all - just > an observation / data point. YMMV? > > Kevin > > On May 8, 2017, at 11:34 AM, J. Eric Wonderley > > wrote: > > Hi Jamie: > > I think typically you want to keep the clients ahead of the server > in version. I would advance the version of you client nodes. > > New clients can communicate with older versions of server nsds. > Vice versa...no so much. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From valdis.kletnieks at vt.edu Mon May 8 20:48:19 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 08 May 2017 15:48:19 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <13767.1494272899@turing-police.cc.vt.edu> On Mon, 08 May 2017 12:06:22 -0400, "Jaime Pinto" said: > Another piece og information is that as far as GPFS goes all clusters > are configured to communicate exclusively over Infiniband, each on a > different 10.20.x.x network, but broadcast 10.20.255.255. As far as Have you verified that broadcast setting actually works, and packets aren't being discarded as martians? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Mon May 8 21:06:28 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 16:06:28 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <13767.1494272899@turing-police.cc.vt.edu> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> <13767.1494272899@turing-police.cc.vt.edu> Message-ID: <20170508160628.20766ng8x98ogjpg@support.scinet.utoronto.ca> Quoting valdis.kletnieks at vt.edu: > On Mon, 08 May 2017 12:06:22 -0400, "Jaime Pinto" said: > >> Another piece og information is that as far as GPFS goes all clusters >> are configured to communicate exclusively over Infiniband, each on a >> different 10.20.x.x network, but broadcast 10.20.255.255. As far as > > Have you verified that broadcast setting actually works, and packets > aren't being discarded as martians? > Yes, we have. They are fine. I'm seeing "failure to join the cluster" messages prior to the "network unreachable" in the mmfslog files, so I'm starting to suspect minor disparities between older releases of 3.5.x.x at one end and newer 4.1.x.x at the other. I'll dig a little more and report the findings. Thanks Jaime ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From UWEFALKE at de.ibm.com Tue May 9 08:16:23 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 9 May 2017 09:16:23 +0200 Subject: [gpfsug-discuss] help with multi-cluster setup: Network isunreachable In-Reply-To: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: Hi, Jaime, I'd suggest you trace a client while trying to connect and check what addresses it is going to talk to actually. It is a bit tedious, but you will be able to find this in the trace report file. You might also get an idea what's going wrong... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Jaime Pinto" To: "gpfsug main discussion list" Date: 05/08/2017 06:06 PM Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable Sent by: gpfsug-discuss-bounces at spectrumscale.org We have a setup in which "cluster 0" is made up of clients only on gpfs v3.5, ie, no NDS's or formal storage on this primary membership. All storage for those clients come in a multi-cluster fashion, from clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). We recently added a new storage cluster 4 (4.1.1-14), and for some obscure reason we keep getting "Network is unreachable" during mount by clients, even though there were no issues or errors with the multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' worked fine, and all clients have an entry in /etc/fstab for the file system associated with the new cluster 4. The weird thing is that we can mount cluster 3 fine (also 4.1). Another piece og information is that as far as GPFS goes all clusters are configured to communicate exclusively over Infiniband, each on a different 10.20.x.x network, but broadcast 10.20.255.255. As far as the IB network goes there are no problems routing/pinging around all the clusters. So this must be internal to GPFS. None of the clusters have the subnet parameter set explicitly at configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem we need to. All have cipherList AUTHONLY. One difference is that cluster 4 has DMAPI enabled (don't think it matters). Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients during mount (10.20.179.1 is one of the NDS on cluster 4): Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node 10.20.179.1 failed with error 447 (client side). Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount wosgpfs.wos-gateway01-ib0:wosgpfs Mon May 8 11:35:28.783 2017: Network is unreachable I see this reference to "TLS handshake" and error 447, however according to the manual this TLS is only set to be default on 4.2 onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. mmdiag --network for some of the client gives this excerpt (broken status): tapenode-ib0 10.20.83.5 broken 233 -1 0 0 Linux/L gpc-f114n014-ib0 10.20.114.14 broken 233 -1 0 0 Linux/L gpc-f114n015-ib0 10.20.114.15 broken 233 -1 0 0 Linux/L gpc-f114n016-ib0 10.20.114.16 broken 233 -1 0 0 Linux/L wos-gateway01-ib0 10.20.179.1 broken 233 -1 0 0 Linux/L I guess I just need a hint on how to troubleshoot this situation (the 4.1 troubleshoot guide is not helping). Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Mark.Bush at siriuscom.com Tue May 9 17:25:00 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Tue, 9 May 2017 16:25:00 +0000 Subject: [gpfsug-discuss] CES and Directory list populating very slowly Message-ID: <9A893CC0-1816-4EDE-A06E-8491DF5509D4@siriuscom.com> I have a customer who is struggling (they already have a PMR open and it?s being actively worked on now). I?m simply seeking understanding of potential places to look. They have an ESS with a few CES nodes in front. Clients connect via SMB to the CES nodes. One fileset has about 300k smallish files in it and when the client opens a windows browser it takes around 30mins to finish populating the files in this SMB share. Here?s where my confusion is. When a client connects to a CES node this is all the job of the CES and it?s protocol services to handle, so in this case CTDB/Samba. But the flow of this is where maybe I?m a little fuzzy. Obviously the CES nodes act as clients to the NSD (IO/nodes in ESS land) servers. So, the data really doesn?t exist on the protocol node but passes things off to the NSD server for regular IO processing. Does the CES node do some type of caching? I?ve heard talk of LROC on CES nodes potentially but I?m curious if all of this is already being stored in the pagepool? What could cause a mostly metadata related simple directory lookup take what seems to the customer a long time for a couple hundred thousand files? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Tue May 9 18:00:22 2017 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 9 May 2017 10:00:22 -0700 Subject: [gpfsug-discuss] CES and Directory list populating very slowly In-Reply-To: <9A893CC0-1816-4EDE-A06E-8491DF5509D4@siriuscom.com> References: <9A893CC0-1816-4EDE-A06E-8491DF5509D4@siriuscom.com> Message-ID: ESS nodes have cache, but what matters most for this type of workloads is to have a very large metadata cache, this resides on the CES node for SMB/NFS workloads. so if you know that your client will use this 300k directory a lot you want to have a very large maxfilestocache setting on this nodes. alternative solution is to install a LROC device and configure a larger statcache, this helps especially if you have multiple larger directories and want to cache as many as possible from all of them. make sure you have enough tokenmanager and memory on them if you have multiple CES nodes and they all will have high settings. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Mark Bush To: gpfsug main discussion list Date: 05/09/2017 05:25 PM Subject: [gpfsug-discuss] CES and Directory list populating very slowly Sent by: gpfsug-discuss-bounces at spectrumscale.org I have a customer who is struggling (they already have a PMR open and it?s being actively worked on now). I?m simply seeking understanding of potential places to look. They have an ESS with a few CES nodes in front. Clients connect via SMB to the CES nodes. One fileset has about 300k smallish files in it and when the client opens a windows browser it takes around 30mins to finish populating the files in this SMB share. Here?s where my confusion is. When a client connects to a CES node this is all the job of the CES and it?s protocol services to handle, so in this case CTDB/Samba. But the flow of this is where maybe I?m a little fuzzy. Obviously the CES nodes act as clients to the NSD (IO/nodes in ESS land) servers. So, the data really doesn?t exist on the protocol node but passes things off to the NSD server for regular IO processing. Does the CES node do some type of caching? I?ve heard talk of LROC on CES nodes potentially but I?m curious if all of this is already being stored in the pagepool? What could cause a mostly metadata related simple directory lookup take what seems to the customer a long time for a couple hundred thousand files? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Tue May 9 19:58:22 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 9 May 2017 14:58:22 -0400 Subject: [gpfsug-discuss] CES and Directory list populating very slowly In-Reply-To: References: <9A893CC0-1816-4EDE-A06E-8491DF5509D4@siriuscom.com> Message-ID: If you haven't already, measure the time directly on the CES node command line skipping Windows and Samba overheads: time ls -l /path or time ls -lR /path Depending which you're interested in. From: "Sven Oehme" To: gpfsug main discussion list Date: 05/09/2017 01:01 PM Subject: Re: [gpfsug-discuss] CES and Directory list populating very slowly Sent by: gpfsug-discuss-bounces at spectrumscale.org ESS nodes have cache, but what matters most for this type of workloads is to have a very large metadata cache, this resides on the CES node for SMB/NFS workloads. so if you know that your client will use this 300k directory a lot you want to have a very large maxfilestocache setting on this nodes. alternative solution is to install a LROC device and configure a larger statcache, this helps especially if you have multiple larger directories and want to cache as many as possible from all of them. make sure you have enough tokenmanager and memory on them if you have multiple CES nodes and they all will have high settings. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Mark Bush ---05/09/2017 05:25:39 PM---I have a customer who is struggling (they already have a PMR open and it?s being actively worked on From: Mark Bush To: gpfsug main discussion list Date: 05/09/2017 05:25 PM Subject: [gpfsug-discuss] CES and Directory list populating very slowly Sent by: gpfsug-discuss-bounces at spectrumscale.org I have a customer who is struggling (they already have a PMR open and it?s being actively worked on now). I?m simply seeking understanding of potential places to look. They have an ESS with a few CES nodes in front. Clients connect via SMB to the CES nodes. One fileset has about 300k smallish files in it and when the client opens a windows browser it takes around 30mins to finish populating the files in this SMB share. Here?s where my confusion is. When a client connects to a CES node this is all the job of the CES and it?s protocol services to handle, so in this case CTDB/Samba. But the flow of this is where maybe I?m a little fuzzy. Obviously the CES nodes act as clients to the NSD (IO/nodes in ESS land) servers. So, the data really doesn?t exist on the protocol node but passes things off to the NSD server for regular IO processing. Does the CES node do some type of caching? I?ve heard talk of LROC on CES nodes potentially but I?m curious if all of this is already being stored in the pagepool? What could cause a mostly metadata related simple directory lookup take what seems to the customer a long time for a couple hundred thousand files? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 105 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed May 10 02:26:19 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 09 May 2017 21:26:19 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network isunreachable In-Reply-To: References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <20170509212619.88345qjpf9ea46kb@support.scinet.utoronto.ca> As it turned out, the 'authorized_keys' file placed in the /var/mmfs/ssl directory of the NDS for the new storage cluster 4 (4.1.1-14) needed an explicit entry of the following format for the bracket associated with clients on cluster 0: nistCompliance=off Apparently the default for 4.1.x is: nistCompliance=SP800-131A I just noticed that on cluster 3 (4.1.1-7) that entry is also present for the bracket associated with clients cluster 0. I guess the Seagate fellows that helped us install the G200 in our facility had that figured out. The original "TLS handshake" error message kind of gave me a hint of the problem, however the 4.1 installation manual specifically mentioned that this could be an issue only on 4.2 onward. The troubleshoot guide for 4.2 has this excerpt: "Ensure that the configurations of GPFS and the remote key management (RKM) server are compatible when it comes to the version of the TLS protocol used upon key retrieval (GPFS uses the nistCompliance configuration variable to control that). In particular, if nistCompliance=SP800-131A is set in GPFS, ensure that the TLS v1.2 protocol is enabled in the RKM server. If this does not resolve the issue, contact the IBM Support Center.". So, how am I to know that nistCompliance=off is even an option? For backward compatibility with the older storage clusters on 3.5 the clients cluster need to have nistCompliance=off I hope this helps the fellows in mixed versions environments, since it's not obvious from the 3.5/4.1 installation manuals or the troubleshoots guide what we should do. Thanks everyone for the help. Jaime Quoting "Uwe Falke" : > Hi, Jaime, > I'd suggest you trace a client while trying to connect and check what > addresses it is going to talk to actually. It is a bit tedious, but you > will be able to find this in the trace report file. You might also get an > idea what's going wrong... > > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Andreas Hasse, Thomas Wolter > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > Date: 05/08/2017 06:06 PM > Subject: [gpfsug-discuss] help with multi-cluster setup: Network is > unreachable > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > We have a setup in which "cluster 0" is made up of clients only on > gpfs v4.1, ie, no NDS's or formal storage on this primary membership. > > All storage for those clients come in a multi-cluster fashion, from > clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). > > We recently added a new storage cluster 4 (4.1.1-14), and for some > obscure reason we keep getting "Network is unreachable" during mount > by clients, even though there were no issues or errors with the > multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' > worked fine, and all clients have an entry in /etc/fstab for the file > system associated with the new cluster 4. The weird thing is that we > can mount cluster 3 fine (also 4.1). > > Another piece og information is that as far as GPFS goes all clusters > are configured to communicate exclusively over Infiniband, each on a > different 10.20.x.x network, but broadcast 10.20.255.255. As far as > the IB network goes there are no problems routing/pinging around all > the clusters. So this must be internal to GPFS. > > None of the clusters have the subnet parameter set explicitly at > configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem > we need to. All have cipherList AUTHONLY. One difference is that > cluster 4 has DMAPI enabled (don't think it matters). > > Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients > during mount (10.20.179.1 is one of the NDS on cluster 4): > Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster > wosgpfs.wos-gateway01-ib0 > Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node > 10.20.179.1 failed with error 447 (client side). > Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster > wosgpfs.wos-gateway01-ib0 > Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount > wosgpfs.wos-gateway01-ib0:wosgpfs > Mon May 8 11:35:28.783 2017: Network is unreachable > > > I see this reference to "TLS handshake" and error 447, however > according to the manual this TLS is only set to be default on 4.2 > onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. > > mmdiag --network for some of the client gives this excerpt (broken > status): > tapenode-ib0 10.20.83.5 > broken 233 -1 0 0 Linux/L > gpc-f114n014-ib0 10.20.114.14 > broken 233 -1 0 0 Linux/L > gpc-f114n015-ib0 10.20.114.15 > broken 233 -1 0 0 Linux/L > gpc-f114n016-ib0 10.20.114.16 > broken 233 -1 0 0 Linux/L > wos-gateway01-ib0 10.20.179.1 > broken 233 -1 0 0 Linux/L > > > > I guess I just need a hint on how to troubleshoot this situation (the > 4.1 troubleshoot guide is not helping). > > Thanks > Jaime > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Robert.Oesterlin at nuance.com Wed May 10 15:13:56 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 10 May 2017 14:13:56 +0000 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error Message-ID: <1AA3F1C6-8677-460F-B61D-FE87DDC5D8AB@nuance.com> I could not find any way to find out what the issue is here - ideas? [root]# mmhealth cluster show nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. I?ve tried it multiple times, always returns this error. I recently switched the cluster over to 4.2.2 Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed May 10 16:46:21 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 10 May 2017 11:46:21 -0400 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error In-Reply-To: <1AA3F1C6-8677-460F-B61D-FE87DDC5D8AB@nuance.com> References: <1AA3F1C6-8677-460F-B61D-FE87DDC5D8AB@nuance.com> Message-ID: <3939.1494431181@turing-police.cc.vt.edu> On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said: > [root]# mmhealth cluster show > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. Does 'mmlsmgr' return something sane? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed May 10 16:52:35 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 10 May 2017 15:52:35 +0000 Subject: [gpfsug-discuss] patched rsync question Message-ID: <27CCB813-DF05-49A6-A510-51499DFF4B85@vanderbilt.edu> Hi All, We are using the patched version of rsync: rsync version 3.0.9 protocol version 30 Copyright (C) 1996-2011 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, gpfs, iconv, symtimes rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details. to copy files from our old GPFS filesystem to our new GPFS filesystem. Unfortunately, for one group I inadvertently left off the ?-A? option when rsync?ing them, so it didn?t preserve their ACL?s. The original files were deleted, but we were able to restore them from a backup taken on April 25th. I looked, but cannot find any option to rsync that would only update based on ACL?s / permissions. Out of 13,000+ files, it appears that 910 have been modified in the interim. So what I am thinking of doing is rerunning the rsync from the restore directory to the new filesystem directory with the -A option. I?ll test this with ??dry-run? first, of course. I am thinking that this will update the ACL?s on all but the 910 modified files, which would then have to be dealt with on a case by case basis. Anyone have any comments on this idea or any better ideas? Thanks! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed May 10 17:20:39 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 10 May 2017 16:20:39 +0000 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error Message-ID: <4CE7EC70-E144-4B93-9747-49C422787785@nuance.com> Yea, it?s fine. I did manage to get it to respond after I did a ?mmsysmoncontrol restart? but it?s still not showing proper status across the cluster. Seems a bit fragile :-) Bob Oesterlin Sr Principal Storage Engineer, Nuance On 5/10/17, 10:46 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu" wrote: On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said: > [root]# mmhealth cluster show > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. Does 'mmlsmgr' return something sane? From Kevin.Buterbaugh at Vanderbilt.Edu Wed May 10 18:57:11 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 10 May 2017 17:57:11 +0000 Subject: [gpfsug-discuss] patched rsync question In-Reply-To: References: <27CCB813-DF05-49A6-A510-51499DFF4B85@vanderbilt.edu> Message-ID: Hi Stephen, Thanks for the suggestion. We thought about doing something similar to this but in the end I just ran a: rsync -aAvu /old/location /new/location And that seems to have updated the ACL?s on everything except the 910 modified files, which we?re dealing with in a manner similar to what you suggest below. Thanks all? Kevin On May 10, 2017, at 12:51 PM, Stephen Ulmer > wrote: If there?s only 13K files, and you don?t want to copy them, why use rsync at all? I think your solution is: * check every restored for for an ACL * copy the ACL to the same file in the new file system What about generating a file list and then just traversing it dumping the ACL from the restored file and adding it to the new file (after transforming the path). You could probably do the dump/assign with a pipe and not even write the ACLs down. You can even multi-thread the process if you have GNU xargs. Something like (untested): xargs -P num_cores_or_something ./helper_script.sh < list_of_files Where helper_script.sh is (also untested): NEWPATH=$( echo $1 | sed -e ?s/remove/replace/' ) getfacl $1 | setfacl $NEWPATH -- Stephen On May 10, 2017, at 11:52 AM, Buterbaugh, Kevin L > wrote: Hi All, We are using the patched version of rsync: rsync version 3.0.9 protocol version 30 Copyright (C) 1996-2011 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, gpfs, iconv, symtimes rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details. to copy files from our old GPFS filesystem to our new GPFS filesystem. Unfortunately, for one group I inadvertently left off the ?-A? option when rsync?ing them, so it didn?t preserve their ACL?s. The original files were deleted, but we were able to restore them from a backup taken on April 25th. I looked, but cannot find any option to rsync that would only update based on ACL?s / permissions. Out of 13,000+ files, it appears that 910 have been modified in the interim. So what I am thinking of doing is rerunning the rsync from the restore directory to the new filesystem directory with the -A option. I?ll test this with ??dry-run? first, of course. I am thinking that this will update the ACL?s on all but the 910 modified files, which would then have to be dealt with on a case by case basis. Anyone have any comments on this idea or any better ideas? Thanks! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From usa-principal at gpfsug.org Wed May 10 21:01:05 2017 From: usa-principal at gpfsug.org (usa-principal-gpfsug.org) Date: Wed, 10 May 2017 13:01:05 -0700 Subject: [gpfsug-discuss] Presentations Uploaded - SSUG Event @NERSC April 4-5 Message-ID: <7501c112d2e6ff79f9c89907a292ddab@webmail.gpfsug.org> All, I have just updated the Presentations page with 19 talks from the US SSUG event last month. The videos should be available on YouTube soon. I'll announce that separately. https://www.spectrumscale.org/presentations/ Cheers, Kristy From Anna.Wagner at de.ibm.com Thu May 11 12:28:22 2017 From: Anna.Wagner at de.ibm.com (Anna Christina Wagner) Date: Thu, 11 May 2017 13:28:22 +0200 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error In-Reply-To: <4CE7EC70-E144-4B93-9747-49C422787785@nuance.com> References: <4CE7EC70-E144-4B93-9747-49C422787785@nuance.com> Message-ID: Hello Bob, 4.2.2 is the release were we introduced "mmhealth cluster show". And you are totally right, it can be a little fragile at times. So a short explanation: We had this situation on test machines as well. Because of issues with the system not only the mm-commands but also usual Linux commands took more than 10 seconds to return. We have internally a default time out of 10 seconds for cli commands. So if you had a failover situation, in which the cluster manager was changed (we have our cluster state manager (CSM) on the cluster manager) and the mmlsmgr command did not return in 10 seconds the node does not know, that it is the CSM and will not start the corresponding service for that. If you want me to look further into it or if you have feedback regarding mmhealth please feel free to send me an email (Anna.Wagner at de.ibm.com) Mit freundlichen Gr??en / Kind regards Wagner, Anna Christina Software Engineer, Spectrum Scale Development IBM Systems IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 10.05.2017 18:21 Subject: Re: [gpfsug-discuss] "mmhealth cluster show" returns error Sent by: gpfsug-discuss-bounces at spectrumscale.org Yea, it?s fine. I did manage to get it to respond after I did a ?mmsysmoncontrol restart? but it?s still not showing proper status across the cluster. Seems a bit fragile :-) Bob Oesterlin Sr Principal Storage Engineer, Nuance On 5/10/17, 10:46 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu" wrote: On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said: > [root]# mmhealth cluster show > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. Does 'mmlsmgr' return something sane? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu May 11 13:05:14 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Thu, 11 May 2017 08:05:14 -0400 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error In-Reply-To: References: <4CE7EC70-E144-4B93-9747-49C422787785@nuance.com> Message-ID: I?ve also been exploring the mmhealth and gpfsgui for the first time this week. I have a test cluster where I?m trying the new stuff. Running 4.2.2-2 mmhealth cluster show says everyone is in nominal status: Component Total Failed Degraded Healthy Other ------------------------------------------------------------------------------------- NODE 12 0 0 12 0 GPFS 12 0 0 12 0 NETWORK 12 0 0 12 0 FILESYSTEM 0 0 0 0 0 DISK 0 0 0 0 0 GUI 1 0 0 1 0 PERFMON 12 0 0 12 0 However on the GUI there is conflicting information: 1) Home page shows 3/8 NSD Servers unhealthy 2) Home page shows 3/21 Nodes unhealthy ? where is it getting this notion? ? there are only 12 nodes in the whole cluster! 3) clicking on either NSD Servers or Nodes leads to the monitoring page where the top half spins forever, bottom half is content-free. I may have installed the pmsensors RPM on a couple of other nodes back in early April, but have forgotten which ones. They are in the production cluster. Also, the storage in this sandbox cluster has not been turned into a filesystem yet. There are a few dozen free NSDs. Perhaps the ?FILESYSTEM CHECKING? status is somehow wedging up the GUI? Node name: storage005.oscar.ccv.brown.edu Node status: HEALTHY Status Change: 15 hours ago Component Status Status Change Reasons ------------------------------------------------------ GPFS HEALTHY 16 hours ago - NETWORK HEALTHY 16 hours ago - FILESYSTEM CHECKING 16 hours ago - GUI HEALTHY 15 hours ago - PERFMON HEALTHY 16 hours ago I?ve tried restarting the GUI service and also rebooted the GUI server, but it comes back looking the same. Any thoughts? > On May 11, 2017, at 7:28 AM, Anna Christina Wagner wrote: > > Hello Bob, > > 4.2.2 is the release were we introduced "mmhealth cluster show". And you are totally right, it can be a little fragile at times. > > So a short explanation: > We had this situation on test machines as well. Because of issues with the system not only the mm-commands but also usual Linux commands > took more than 10 seconds to return. We have internally a default time out of 10 seconds for cli commands. So if you had a failover situation, in which the cluster manager > was changed (we have our cluster state manager (CSM) on the cluster manager) and the mmlsmgr command did not return in 10 seconds the node does not > know, that it is the CSM and will not start the corresponding service for that. > > > If you want me to look further into it or if you have feedback regarding mmhealth please feel free to send me an email (Anna.Wagner at de.ibm.com) > > Mit freundlichen Gr??en / Kind regards > > Wagner, Anna Christina > > Software Engineer, Spectrum Scale Development > IBM Systems > > IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz > Gesch?ftsf?hrung: Dirk Wittkopp > Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 > > > > From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 10.05.2017 18:21 > Subject: Re: [gpfsug-discuss] "mmhealth cluster show" returns error > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Yea, it?s fine. > > I did manage to get it to respond after I did a ?mmsysmoncontrol restart? but it?s still not showing proper status across the cluster. > > Seems a bit fragile :-) > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > On 5/10/17, 10:46 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu" wrote: > > On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said: > > > [root]# mmhealth cluster show > > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. > > Does 'mmlsmgr' return something sane? > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu May 11 13:36:47 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 11 May 2017 12:36:47 +0000 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error Message-ID: <9C601DFD-16FF-40E7-8D46-16033C443428@nuance.com> Thanks Anna, I will email you directly. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 From: on behalf of Anna Christina Wagner Reply-To: gpfsug main discussion list Date: Thursday, May 11, 2017 at 6:28 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] "mmhealth cluster show" returns error Hello Bob, 4.2.2 is the release were we introduced "mmhealth cluster show". And you are totally right, it can be a little fragile at times. So a short explanation: We had this situation on test machines as well. Because of issues with the system not only the mm-commands but also usual Linux commands took more than 10 seconds to return. We have internally a default time out of 10 seconds for cli commands. So if you had a failover situation, in which the cluster manager was changed (we have our cluster state manager (CSM) on the cluster manager) and the mmlsmgr command did not return in 10 seconds the node does not know, that it is the CSM and will not start the corresponding service for that. If you want me to look further into it or if you have feedback regarding mmhealth please feel free to send me an email (Anna.Wagner at de.ibm.com) Mit freundlichen Gr??en / Kind regards Wagner, Anna Christina Software Engineer, Spectrum Scale Development IBM Systems IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Christian.Fey at sva.de Thu May 11 16:37:43 2017 From: Christian.Fey at sva.de (Fey, Christian) Date: Thu, 11 May 2017 15:37:43 +0000 Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Message-ID: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The old ctdb cluster uses rid as idmap backend. Since CES officially supports only autorid, I tried to choose the right values for the idmap ranges / sizes to get the same IDs but was not successful with this. The old samba has a range assigned for their Active directory (idmap config XYZ : range = 1000000-1999999) My idea was to set autorid to the following during mmuserauth create: idmap config * : backend = autorid idmap config * : range = 200000-2999999 idmap config * : rangesize = 800000 With that it should use the first range for the builtin range and the second should then start with 1000000 like in the old rid config. Sadly, the range of the domain is the third one: /usr/lpp/mmfs/bin/net idmap get ranges RANGE 0: ALLOC RANGE 1: S-1-5-32 RANGE 2: S-1-5-21-123456789-123456789-123456789 Does anyone have an idea how to fix this, maybe in a supported way and without storing the IDs in the domain? Further on, does anyone use rid as backend, even if not officially supported? Maybe we could file a RPQ or sth. Like this. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5467 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Thu May 11 18:49:02 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 11 May 2017 17:49:02 +0000 Subject: [gpfsug-discuss] Edge case failure mode Message-ID: Just following up on some discussions we had at the UG this week. I mentioned a few weeks back that we were having issues with failover of NFS, and we figured a work around to our clients for this so that failover works great now (plus there is some code fixes coming down the line as well to help). Here's my story of fun with protocol nodes ... Since then we've occasionally been seeing the load average of 1 CES node rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients. A lot of digging and we found that CTDB was reporting > 80% memory used, so we tweaked the page pool down to solve this. Great we thought ... But alas that wasn't the cause. Just to be clear 95% of the time, the CES node is fine, I can do and ls in the mounted file-systems and all is good. When the load rises to 400, an ls takes 20-30 seconds, so they are related, but what is the initial cause? Other CES nodes are 100% fine and if we do mmces node suspend, and then resume all is well on the node (and no other CES node assumes the problem as the IP moves). Its not always the same CES IP, node or even data centre, and most of the time is looks fine. I logged a ticket with OCF today, and one thing they suggested was to disable NFSv3 as they've seen similar behaviour at another site. As far as I know, all my NFS clients are v4, but sure we disable v3 anyway as its not actually needed. (Both at the ganesha layer, change the default for exports and reconfigure all existing exports to v4 only for good measure). That didn't help, but certainly worth a try! Note that my CES cluster is multi-cluster mounting the file-systems and from the POSIX side, its fine most of the time. We've used the mmnetverify command to check that all is well as well. Of course this only checks the local cluster, not remote nodes, but as we aren't seeing expels and can access the FS, we assume that the GPFS layer is working fine. So we finally log a PMR with IBM, I catch a node in a broken state and pull a trace from it and upload that, and ask what other traces they might want (apparently there is no protocol trace for NFS in 4.2.2-3). Now, when we run this, I note that its doing things like mmlsfileset to the remote storage, coming from two clusters and some of this is timing out. We've already had issues with rp_filter on remote nodes causing expels, but the storage backend here has only 1 nic, and we can mount and access it all fine. So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not GPFS ping of course), but not make "admin" calls to it. Ssh appears to work fine as well BTW to it. So I check on my CES and this is multi-homed and rp_filter is enabled. Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm sure I'm an edge case, but it would be REALLY REALLY helpful to get mmnetverify to work across a cluster (e.g. I say this is a remote node and here's its FQDN, can you talk to it) which would have helped with diagnosis here. I'm not entirely sure why ssh etc would work and pass rp_filter, but not GPFS traffic (in some cases apparently), but I guess its something to do with how GPFS is binding and then the kernel routing layer. I'm still not sure if this is my root cause as the occurrences of the high load are a bit random (anything from every hour to being stable for 2-3 days), but since making the rp_filter change this afternoon, so far ...? I've created an RFE for mmnetverify to be able to test across a cluster... https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503 0 Simon From bbanister at jumptrading.com Thu May 11 18:58:18 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 11 May 2017 17:58:18 +0000 Subject: [gpfsug-discuss] Edge case failure mode In-Reply-To: References: Message-ID: <87b204b6e245439bb475792cf3672aa5@jumptrading.com> Hey Simon, I clicked your link but I think it went to a page that is not about this RFE: [cid:image001.png at 01D2CA56.41F66270] Cheers, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, May 11, 2017 12:49 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Edge case failure mode Just following up on some discussions we had at the UG this week. I mentioned a few weeks back that we were having issues with failover of NFS, and we figured a work around to our clients for this so that failover works great now (plus there is some code fixes coming down the line as well to help). Here's my story of fun with protocol nodes ... Since then we've occasionally been seeing the load average of 1 CES node rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients. A lot of digging and we found that CTDB was reporting > 80% memory used, so we tweaked the page pool down to solve this. Great we thought ... But alas that wasn't the cause. Just to be clear 95% of the time, the CES node is fine, I can do and ls in the mounted file-systems and all is good. When the load rises to 400, an ls takes 20-30 seconds, so they are related, but what is the initial cause? Other CES nodes are 100% fine and if we do mmces node suspend, and then resume all is well on the node (and no other CES node assumes the problem as the IP moves). Its not always the same CES IP, node or even data centre, and most of the time is looks fine. I logged a ticket with OCF today, and one thing they suggested was to disable NFSv3 as they've seen similar behaviour at another site. As far as I know, all my NFS clients are v4, but sure we disable v3 anyway as its not actually needed. (Both at the ganesha layer, change the default for exports and reconfigure all existing exports to v4 only for good measure). That didn't help, but certainly worth a try! Note that my CES cluster is multi-cluster mounting the file-systems and from the POSIX side, its fine most of the time. We've used the mmnetverify command to check that all is well as well. Of course this only checks the local cluster, not remote nodes, but as we aren't seeing expels and can access the FS, we assume that the GPFS layer is working fine. So we finally log a PMR with IBM, I catch a node in a broken state and pull a trace from it and upload that, and ask what other traces they might want (apparently there is no protocol trace for NFS in 4.2.2-3). Now, when we run this, I note that its doing things like mmlsfileset to the remote storage, coming from two clusters and some of this is timing out. We've already had issues with rp_filter on remote nodes causing expels, but the storage backend here has only 1 nic, and we can mount and access it all fine. So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not GPFS ping of course), but not make "admin" calls to it. Ssh appears to work fine as well BTW to it. So I check on my CES and this is multi-homed and rp_filter is enabled. Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm sure I'm an edge case, but it would be REALLY REALLY helpful to get mmnetverify to work across a cluster (e.g. I say this is a remote node and here's its FQDN, can you talk to it) which would have helped with diagnosis here. I'm not entirely sure why ssh etc would work and pass rp_filter, but not GPFS traffic (in some cases apparently), but I guess its something to do with how GPFS is binding and then the kernel routing layer. I'm still not sure if this is my root cause as the occurrences of the high load are a bit random (anything from every hour to being stable for 2-3 days), but since making the rp_filter change this afternoon, so far ...? I've created an RFE for mmnetverify to be able to test across a cluster... https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503 0 Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 36506 bytes Desc: image001.png URL: From S.J.Thompson at bham.ac.uk Thu May 11 19:05:08 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 11 May 2017 18:05:08 +0000 Subject: [gpfsug-discuss] Edge case failure mode Message-ID: Cheers Bryan ... http://goo.gl/YXitIF Points to: (Outlook/mailing list is line breaking and cutting the trailing 0) https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=105030 Simon From: > on behalf of "bbanister at jumptrading.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 11 May 2017 at 18:58 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] Edge case failure mode Hey Simon, I clicked your link but I think it went to a page that is not about this RFE: [cid:image001.png at 01D2CA56.41F66270] Cheers, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, May 11, 2017 12:49 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Edge case failure mode Just following up on some discussions we had at the UG this week. I mentioned a few weeks back that we were having issues with failover of NFS, and we figured a work around to our clients for this so that failover works great now (plus there is some code fixes coming down the line as well to help). Here's my story of fun with protocol nodes ... Since then we've occasionally been seeing the load average of 1 CES node rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients. A lot of digging and we found that CTDB was reporting > 80% memory used, so we tweaked the page pool down to solve this. Great we thought ... But alas that wasn't the cause. Just to be clear 95% of the time, the CES node is fine, I can do and ls in the mounted file-systems and all is good. When the load rises to 400, an ls takes 20-30 seconds, so they are related, but what is the initial cause? Other CES nodes are 100% fine and if we do mmces node suspend, and then resume all is well on the node (and no other CES node assumes the problem as the IP moves). Its not always the same CES IP, node or even data centre, and most of the time is looks fine. I logged a ticket with OCF today, and one thing they suggested was to disable NFSv3 as they've seen similar behaviour at another site. As far as I know, all my NFS clients are v4, but sure we disable v3 anyway as its not actually needed. (Both at the ganesha layer, change the default for exports and reconfigure all existing exports to v4 only for good measure). That didn't help, but certainly worth a try! Note that my CES cluster is multi-cluster mounting the file-systems and from the POSIX side, its fine most of the time. We've used the mmnetverify command to check that all is well as well. Of course this only checks the local cluster, not remote nodes, but as we aren't seeing expels and can access the FS, we assume that the GPFS layer is working fine. So we finally log a PMR with IBM, I catch a node in a broken state and pull a trace from it and upload that, and ask what other traces they might want (apparently there is no protocol trace for NFS in 4.2.2-3). Now, when we run this, I note that its doing things like mmlsfileset to the remote storage, coming from two clusters and some of this is timing out. We've already had issues with rp_filter on remote nodes causing expels, but the storage backend here has only 1 nic, and we can mount and access it all fine. So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not GPFS ping of course), but not make "admin" calls to it. Ssh appears to work fine as well BTW to it. So I check on my CES and this is multi-homed and rp_filter is enabled. Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm sure I'm an edge case, but it would be REALLY REALLY helpful to get mmnetverify to work across a cluster (e.g. I say this is a remote node and here's its FQDN, can you talk to it) which would have helped with diagnosis here. I'm not entirely sure why ssh etc would work and pass rp_filter, but not GPFS traffic (in some cases apparently), but I guess its something to do with how GPFS is binding and then the kernel routing layer. I'm still not sure if this is my root cause as the occurrences of the high load are a bit random (anything from every hour to being stable for 2-3 days), but since making the rp_filter change this afternoon, so far ...? I've created an RFE for mmnetverify to be able to test across a cluster... https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503 0 Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 36506 bytes Desc: image001.png URL: From pinto at scinet.utoronto.ca Thu May 11 19:17:06 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 11 May 2017 14:17:06 -0400 Subject: [gpfsug-discuss] BIG LAG since 3.5 on quota accounting reconciliation In-Reply-To: <20170331101834.162212izrlj4swu2@support.scinet.utoronto.ca> References: <20170331101834.162212izrlj4swu2@support.scinet.utoronto.ca> Message-ID: <20170511141706.942536htkinc957m@support.scinet.utoronto.ca> Just bumping up. When I first posted this subject at the end of March there was a UG meeting that drove people's attention. I hope to get some comments now. Thanks Jaime Quoting "Jaime Pinto" : > In the old days of DDN 9900 and gpfs 3.4 I only had to run mmcheckquota > once a month, usually after the massive monthly purge. > > I noticed that starting with the GSS and ESS appliances under 3.5 that > I needed to run mmcheckquota more often, at least once a week, or as > often as daily, to clear the slippage errors in the accounting > information, otherwise users complained that they were hitting their > quotas, even throughout they deleted a lot of stuff. > > More recently we adopted a G200 appliance (1.8PB), with v4.1, and now > things have gotten worst, and I have to run it twice daily, just in > case. > > So, what I am missing? Is their a parameter since 3.5 and through 4.1 > that we can set, so that GPFS will reconcile the quota accounting > internally more often and on its own? > > Thanks > Jaime > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From bbanister at jumptrading.com Thu May 11 19:20:47 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 11 May 2017 18:20:47 +0000 Subject: [gpfsug-discuss] Edge case failure mode In-Reply-To: References: Message-ID: <607e7c81dd3349fd8c0a8602d1938e3b@jumptrading.com> I was wondering why that 0 was left on that line alone... hahaha, -B From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, May 11, 2017 1:05 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Edge case failure mode Cheers Bryan ... http://goo.gl/YXitIF Points to: (Outlook/mailing list is line breaking and cutting the trailing 0) https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=105030 Simon From: > on behalf of "bbanister at jumptrading.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 11 May 2017 at 18:58 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] Edge case failure mode Hey Simon, I clicked your link but I think it went to a page that is not about this RFE: [cid:image001.png at 01D2CA59.65CF7300] Cheers, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, May 11, 2017 12:49 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Edge case failure mode Just following up on some discussions we had at the UG this week. I mentioned a few weeks back that we were having issues with failover of NFS, and we figured a work around to our clients for this so that failover works great now (plus there is some code fixes coming down the line as well to help). Here's my story of fun with protocol nodes ... Since then we've occasionally been seeing the load average of 1 CES node rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients. A lot of digging and we found that CTDB was reporting > 80% memory used, so we tweaked the page pool down to solve this. Great we thought ... But alas that wasn't the cause. Just to be clear 95% of the time, the CES node is fine, I can do and ls in the mounted file-systems and all is good. When the load rises to 400, an ls takes 20-30 seconds, so they are related, but what is the initial cause? Other CES nodes are 100% fine and if we do mmces node suspend, and then resume all is well on the node (and no other CES node assumes the problem as the IP moves). Its not always the same CES IP, node or even data centre, and most of the time is looks fine. I logged a ticket with OCF today, and one thing they suggested was to disable NFSv3 as they've seen similar behaviour at another site. As far as I know, all my NFS clients are v4, but sure we disable v3 anyway as its not actually needed. (Both at the ganesha layer, change the default for exports and reconfigure all existing exports to v4 only for good measure). That didn't help, but certainly worth a try! Note that my CES cluster is multi-cluster mounting the file-systems and from the POSIX side, its fine most of the time. We've used the mmnetverify command to check that all is well as well. Of course this only checks the local cluster, not remote nodes, but as we aren't seeing expels and can access the FS, we assume that the GPFS layer is working fine. So we finally log a PMR with IBM, I catch a node in a broken state and pull a trace from it and upload that, and ask what other traces they might want (apparently there is no protocol trace for NFS in 4.2.2-3). Now, when we run this, I note that its doing things like mmlsfileset to the remote storage, coming from two clusters and some of this is timing out. We've already had issues with rp_filter on remote nodes causing expels, but the storage backend here has only 1 nic, and we can mount and access it all fine. So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not GPFS ping of course), but not make "admin" calls to it. Ssh appears to work fine as well BTW to it. So I check on my CES and this is multi-homed and rp_filter is enabled. Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm sure I'm an edge case, but it would be REALLY REALLY helpful to get mmnetverify to work across a cluster (e.g. I say this is a remote node and here's its FQDN, can you talk to it) which would have helped with diagnosis here. I'm not entirely sure why ssh etc would work and pass rp_filter, but not GPFS traffic (in some cases apparently), but I guess its something to do with how GPFS is binding and then the kernel routing layer. I'm still not sure if this is my root cause as the occurrences of the high load are a bit random (anything from every hour to being stable for 2-3 days), but since making the rp_filter change this afternoon, so far ...? I've created an RFE for mmnetverify to be able to test across a cluster... https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503 0 Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 36506 bytes Desc: image001.png URL: From UWEFALKE at de.ibm.com Thu May 11 20:42:29 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 11 May 2017 21:42:29 +0200 Subject: [gpfsug-discuss] BIG LAG since 3.5 on quotaaccountingreconciliation In-Reply-To: <20170511141706.942536htkinc957m@support.scinet.utoronto.ca> References: <20170331101834.162212izrlj4swu2@support.scinet.utoronto.ca> <20170511141706.942536htkinc957m@support.scinet.utoronto.ca> Message-ID: Hi, Jaimie, we got the same problem, also with a GSS although I suppose it's rather to do with the code above GNR, but who knows. I have a PMR open for quite some time (and had others as well). Seems like things improved by upgrading the FS version, but atre not gone. However, these issues are to be solved via PMRs. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Jaime Pinto" To: "gpfsug main discussion list" , "Jaime Pinto" Date: 05/11/2017 08:17 PM Subject: Re: [gpfsug-discuss] BIG LAG since 3.5 on quota accounting reconciliation Sent by: gpfsug-discuss-bounces at spectrumscale.org Just bumping up. When I first posted this subject at the end of March there was a UG meeting that drove people's attention. I hope to get some comments now. Thanks Jaime Quoting "Jaime Pinto" : > In the old days of DDN 9900 and gpfs 3.4 I only had to run mmcheckquota > once a month, usually after the massive monthly purge. > > I noticed that starting with the GSS and ESS appliances under 3.5 that > I needed to run mmcheckquota more often, at least once a week, or as > often as daily, to clear the slippage errors in the accounting > information, otherwise users complained that they were hitting their > quotas, even throughout they deleted a lot of stuff. > > More recently we adopted a G200 appliance (1.8PB), with v4.1, and now > things have gotten worst, and I have to run it twice daily, just in > case. > > So, what I am missing? Is their a parameter since 3.5 and through 4.1 > that we can set, so that GPFS will reconcile the quota accounting > internally more often and on its own? > > Thanks > Jaime > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From damir.krstic at gmail.com Fri May 12 11:42:19 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Fri, 12 May 2017 10:42:19 +0000 Subject: [gpfsug-discuss] connected v. datagram mode Message-ID: I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and are in datagram mode. In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. What is is the right thing to do? Thanks in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Fri May 12 12:43:01 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Fri, 12 May 2017 07:43:01 -0400 Subject: [gpfsug-discuss] BIG LAG since 3.5 on quotaaccountingreconciliation In-Reply-To: References: <20170331101834.162212izrlj4swu2@support.scinet.utoronto.ca> <20170511141706.942536htkinc957m@support.scinet.utoronto.ca> Message-ID: <20170512074301.91955kiad218rl51@support.scinet.utoronto.ca> I like to give the community a chance to reflect on the issue, check their own installations and possibly give us all some comments. If in a few more days we still don't get any hints I'll have to open a couple of support tickets (IBM, DDN, Seagate, ...). Cheers Jaime Quoting "Uwe Falke" : > Hi, Jaimie, > > we got the same problem, also with a GSS although I suppose it's rather to > do with the code above GNR, but who knows. > I have a PMR open for quite some time (and had others as well). > Seems like things improved by upgrading the FS version, but atre not gone. > > > However, these issues are to be solved via PMRs. > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Andreas Hasse, Thomas Wolter > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" , > "Jaime Pinto" > Date: 05/11/2017 08:17 PM > Subject: Re: [gpfsug-discuss] BIG LAG since 3.5 on quota accounting > reconciliation > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Just bumping up. > When I first posted this subject at the end of March there was a UG > meeting that drove people's attention. > > I hope to get some comments now. > > Thanks > Jaime > > Quoting "Jaime Pinto" : > >> In the old days of DDN 9900 and gpfs 3.4 I only had to run mmcheckquota >> once a month, usually after the massive monthly purge. >> >> I noticed that starting with the GSS and ESS appliances under 3.5 that >> I needed to run mmcheckquota more often, at least once a week, or as >> often as daily, to clear the slippage errors in the accounting >> information, otherwise users complained that they were hitting their >> quotas, even throughout they deleted a lot of stuff. >> >> More recently we adopted a G200 appliance (1.8PB), with v4.1, and now >> things have gotten worst, and I have to run it twice daily, just in >> case. >> >> So, what I am missing? Is their a parameter since 3.5 and through 4.1 >> that we can set, so that GPFS will reconcile the quota accounting >> internally more often and on its own? >> >> Thanks >> Jaime >> > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From jonathon.anderson at colorado.edu Fri May 12 15:43:55 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Fri, 12 May 2017 14:43:55 +0000 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: Message-ID: This won?t tell you which to use; but datagram mode and connected mode in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that there?s no checking/retry built into the protocol; the other is ?reliable? and detects whether data is received completely and in the correct order. The last advice I heard for traditional IB was that the overhead of connected mode isn?t worth it, particularly if you?re using IPoIB (where you?re likely to be using TCP anyway). That said, on our OPA network we?re seeing the opposite advice; so I, to, am often unsure what the most correct configuration would be for any given fabric. ~jonathon On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Damir Krstic" wrote: I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and are in datagram mode. In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. What is is the right thing to do? Thanks in advance. Damir From aaron.s.knister at nasa.gov Fri May 12 15:48:14 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 12 May 2017 10:48:14 -0400 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: Message-ID: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> For what it's worth we've seen *significantly* better performance of streaming benchmarks of IPoIB with connected mode vs datagram mode on IB. -Aaron On 5/12/17 10:43 AM, Jonathon A Anderson wrote: > This won?t tell you which to use; but datagram mode and connected mode in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that there?s no checking/retry built into the protocol; the other is ?reliable? and detects whether data is received completely and in the correct order. > > The last advice I heard for traditional IB was that the overhead of connected mode isn?t worth it, particularly if you?re using IPoIB (where you?re likely to be using TCP anyway). That said, on our OPA network we?re seeing the opposite advice; so I, to, am often unsure what the most correct configuration would be for any given fabric. > > ~jonathon > > > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Damir Krstic" wrote: > > I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and > are in datagram mode. > > > In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. > What is is the right thing to do? > > > Thanks in advance. > Damir > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From janfrode at tanso.net Fri May 12 16:03:03 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Fri, 12 May 2017 15:03:03 +0000 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: I also don't know much about this, but the ESS quick deployment guide is quite clear on the we should use connected mode for IPoIB: -------------- Note: If using bonded IP over IB, do the following: Ensure that the CONNECTED_MODE=yes statement exists in the corresponding slave-bond interface scripts located in /etc/sysconfig/network-scripts directory of the management server and I/O server nodes. These scripts are created as part of the IP over IB bond creation. An example of the slave-bond interface with the modification is shown below. --------------- -jf fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister : > For what it's worth we've seen *significantly* better performance of > streaming benchmarks of IPoIB with connected mode vs datagram mode on IB. > > -Aaron > > On 5/12/17 10:43 AM, Jonathon A Anderson wrote: > > This won?t tell you which to use; but datagram mode and connected mode > in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that > there?s no checking/retry built into the protocol; the other is ?reliable? > and detects whether data is received completely and in the correct order. > > > > The last advice I heard for traditional IB was that the overhead of > connected mode isn?t worth it, particularly if you?re using IPoIB (where > you?re likely to be using TCP anyway). That said, on our OPA network we?re > seeing the opposite advice; so I, to, am often unsure what the most correct > configuration would be for any given fabric. > > > > ~jonathon > > > > > > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Damir Krstic" behalf of damir.krstic at gmail.com> wrote: > > > > I never fully understood the difference between connected v. > datagram mode beside the obvious packet size difference. Our NSD servers > (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our > 700+ clients are running RH6 and > > are in datagram mode. > > > > > > In a month we are upgrading our cluster to RedHat 7 and are debating > whether to leave the compute nodes in datagram mode or whether to switch > them to connected mode. > > What is is the right thing to do? > > > > > > Thanks in advance. > > Damir > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Fri May 12 16:05:47 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Fri, 12 May 2017 15:05:47 +0000 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: It may be true that you should always favor connected mode; but those instructions look like they?re specifically only talking about when you have bonded interfaces. ~jonathon On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jan-Frode Myklebust" wrote: I also don't know much about this, but the ESS quick deployment guide is quite clear on the we should use connected mode for IPoIB: -------------- Note: If using bonded IP over IB, do the following: Ensure that the CONNECTED_MODE=yes statement exists in the corresponding slave-bond interface scripts located in /etc/sysconfig/network-scripts directory of the management server and I/O server nodes. These scripts are created as part of the IP over IB bond creation. An example of the slave-bond interface with the modification is shown below. --------------- -jf fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister : For what it's worth we've seen *significantly* better performance of streaming benchmarks of IPoIB with connected mode vs datagram mode on IB. -Aaron On 5/12/17 10:43 AM, Jonathon A Anderson wrote: > This won?t tell you which to use; but datagram mode and connected mode in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that there?s no checking/retry built into the protocol; the other is ?reliable? and detects whether data is received completely and in the correct order. > > The last advice I heard for traditional IB was that the overhead of connected mode isn?t worth it, particularly if you?re using IPoIB (where you?re likely to be using TCP anyway). That said, on our OPA network we?re seeing the opposite advice; so I, to, am often unsure what the most correct configuration would be for any given fabric. > > ~jonathon > > > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Damir Krstic" wrote: > > I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and > are in datagram mode. > > > In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. > What is is the right thing to do? > > > Thanks in advance. > Damir > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From usa-principal at gpfsug.org Fri May 12 17:03:46 2017 From: usa-principal at gpfsug.org (usa-principal-gpfsug.org) Date: Fri, 12 May 2017 09:03:46 -0700 Subject: [gpfsug-discuss] YouTube Videos of Talks - April 4-5 US SSUG Meeting at NERSC Message-ID: All, The YouTube videos are now available on the Spectrum Scale/GPFS User Group channel, and will be on the IBM channel as well in the near term. https://www.youtube.com/playlist?list=PLrdepxEIEyCp1TqZ2z3WfGOgqO9oY01xY Cheers, Kristy From laurence at qsplace.co.uk Sat May 13 00:27:19 2017 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Sat, 13 May 2017 00:27:19 +0100 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: It also depends on the adapter. We have seen better performance using datagram with MLNX adapters however we see better in connected mode when using Intel truescale. Again as Jonathon has mentioned we have also seen better performance when using connected mode on active/slave bonded interface (even between a mixed MLNX/TS fabric). There is also a significant difference in the MTU size you can use in datagram vs connected mode, with datagram being limited to 2044 (if memory serves) there as connected mode can use 65536 (again if memory serves). I typically now run qperf and nsdperf benchmarks to find the best configuration. -- Lauz On 12/05/2017 16:05, Jonathon A Anderson wrote: > It may be true that you should always favor connected mode; but those instructions look like they?re specifically only talking about when you have bonded interfaces. > > ~jonathon > > > On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jan-Frode Myklebust" wrote: > > > > > I also don't know much about this, but the ESS quick deployment guide is quite clear on the we should use connected mode for IPoIB: > > -------------- > Note: If using bonded IP over IB, do the following: Ensure that the CONNECTED_MODE=yes statement exists in the corresponding slave-bond interface scripts located in /etc/sysconfig/network-scripts directory of the management server and I/O server nodes. These > scripts are created as part of the IP over IB bond creation. An example of the slave-bond interface with the modification is shown below. > --------------- > > > -jf > fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister : > > > For what it's worth we've seen *significantly* better performance of > streaming benchmarks of IPoIB with connected mode vs datagram mode on IB. > > -Aaron > > On 5/12/17 10:43 AM, Jonathon A Anderson wrote: > > This won?t tell you which to use; but datagram mode and connected mode in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that there?s no checking/retry built into the protocol; the other is ?reliable? and detects whether data is received > completely and in the correct order. > > > > The last advice I heard for traditional IB was that the overhead of connected mode isn?t worth it, particularly if you?re using IPoIB (where you?re likely to be using TCP anyway). That said, on our OPA network we?re seeing the opposite advice; so I, to, am > often unsure what the most correct configuration would be for any given fabric. > > > > ~jonathon > > > > > > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Damir Krstic" on behalf of damir.krstic at gmail.com> wrote: > > > > I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and > > are in datagram mode. > > > > > > In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. > > What is is the right thing to do? > > > > > > Thanks in advance. > > Damir > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at > spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at > spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stijn.deweirdt at ugent.be Sun May 14 10:16:12 2017 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Sun, 14 May 2017 11:16:12 +0200 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: hi all, does anyone know about the impact of memory usage? afaik, connected mode keeps buffers for each QP (old-ish mellanox (connectx-2, MLNX ofed2) instructions suggested not to use CM for large-ish (>128 nodes at that time) clusters. we never turned it back on, and now have 700 nodes. wrt IPoIB MTU, UD can have up to 4042 (or something like that) with correct opensm configuration. stijn On 05/13/2017 01:27 AM, Laurence Horrocks-Barlow wrote: > It also depends on the adapter. > > We have seen better performance using datagram with MLNX adapters > however we see better in connected mode when using Intel truescale. > Again as Jonathon has mentioned we have also seen better performance > when using connected mode on active/slave bonded interface (even between > a mixed MLNX/TS fabric). > > There is also a significant difference in the MTU size you can use in > datagram vs connected mode, with datagram being limited to 2044 (if > memory serves) there as connected mode can use 65536 (again if memory > serves). > > I typically now run qperf and nsdperf benchmarks to find the best > configuration. > > -- Lauz > > On 12/05/2017 16:05, Jonathon A Anderson wrote: >> It may be true that you should always favor connected mode; but those >> instructions look like they?re specifically only talking about when >> you have bonded interfaces. >> >> ~jonathon >> >> >> On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of Jan-Frode Myklebust" >> > janfrode at tanso.net> wrote: >> >> I also don't know much about this, but the ESS >> quick deployment guide is quite clear on the we should use connected >> mode for IPoIB: >> -------------- >> Note: If using bonded IP over IB, do the following: Ensure that >> the CONNECTED_MODE=yes statement exists in the corresponding >> slave-bond interface scripts located in /etc/sysconfig/network-scripts >> directory of the management server and I/O server nodes. These >> scripts are created as part of the IP over IB bond creation. An >> example of the slave-bond interface with the modification is shown below. >> --------------- >> -jf >> fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister >> : >> For what it's worth we've seen *significantly* better >> performance of >> streaming benchmarks of IPoIB with connected mode vs datagram >> mode on IB. >> -Aaron >> On 5/12/17 10:43 AM, Jonathon A Anderson wrote: >> > This won?t tell you which to use; but datagram mode and >> connected mode in IB is roughly analogous to UDB vs TCP in IP. One is >> ?unreliable? in that there?s no checking/retry built into the >> protocol; the other is ?reliable? and detects whether data is received >> completely and in the correct order. >> > >> > The last advice I heard for traditional IB was that the >> overhead of connected mode isn?t worth it, particularly if you?re >> using IPoIB (where you?re likely to be using TCP anyway). That said, >> on our OPA network we?re seeing the opposite advice; so I, to, am >> often unsure what the most correct configuration would be for >> any given fabric. >> > >> > ~jonathon >> > >> > >> > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org >> on behalf of Damir Krstic" > on behalf of damir.krstic at gmail.com> wrote: >> > >> > I never fully understood the difference between connected >> v. datagram mode beside the obvious packet size difference. Our NSD >> servers (ESS GL6 nodes) are installed with RedHat 7 and are in >> connected mode. Our 700+ clients are running RH6 and >> > are in datagram mode. >> > >> > >> > In a month we are upgrading our cluster to RedHat 7 and are >> debating whether to leave the compute nodes in datagram mode or >> whether to switch them to connected mode. >> > What is is the right thing to do? >> > >> > >> > Thanks in advance. >> > Damir >> > >> > >> > >> > _______________________________________________ >> > gpfsug-discuss mailing list >> > gpfsug-discuss at >> spectrumscale.org >> > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at >> spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Greg.Lehmann at csiro.au Mon May 15 00:41:13 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Sun, 14 May 2017 23:41:13 +0000 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: <82aac761681744b28e7010f22ef7cb81@exch1-cdc.nexus.csiro.au> I asked Mellanox about this nearly 2 years ago and was told around the 500 node mark there will be a tipping point and that datagram will be more useful after that. Memory utilisation was the issue. I've also seen references to smaller node counts more recently as well as generic recommendations to use datagram for any size cluster. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Stijn De Weirdt Sent: Sunday, 14 May 2017 7:16 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] connected v. datagram mode hi all, does anyone know about the impact of memory usage? afaik, connected mode keeps buffers for each QP (old-ish mellanox (connectx-2, MLNX ofed2) instructions suggested not to use CM for large-ish (>128 nodes at that time) clusters. we never turned it back on, and now have 700 nodes. wrt IPoIB MTU, UD can have up to 4042 (or something like that) with correct opensm configuration. stijn On 05/13/2017 01:27 AM, Laurence Horrocks-Barlow wrote: > It also depends on the adapter. > > We have seen better performance using datagram with MLNX adapters > however we see better in connected mode when using Intel truescale. > Again as Jonathon has mentioned we have also seen better performance > when using connected mode on active/slave bonded interface (even > between a mixed MLNX/TS fabric). > > There is also a significant difference in the MTU size you can use in > datagram vs connected mode, with datagram being limited to 2044 (if > memory serves) there as connected mode can use 65536 (again if memory > serves). > > I typically now run qperf and nsdperf benchmarks to find the best > configuration. > > -- Lauz > > On 12/05/2017 16:05, Jonathon A Anderson wrote: >> It may be true that you should always favor connected mode; but those >> instructions look like they?re specifically only talking about when >> you have bonded interfaces. >> >> ~jonathon >> >> >> On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of Jan-Frode Myklebust" >> > janfrode at tanso.net> wrote: >> >> I also don't know much about this, but the ESS >> quick deployment guide is quite clear on the we should use connected >> mode for IPoIB: >> -------------- >> Note: If using bonded IP over IB, do the following: Ensure that >> the CONNECTED_MODE=yes statement exists in the corresponding >> slave-bond interface scripts located in >> /etc/sysconfig/network-scripts directory of the management server and I/O server nodes. These >> scripts are created as part of the IP over IB bond creation. An >> example of the slave-bond interface with the modification is shown below. >> --------------- >> -jf >> fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister >> : >> For what it's worth we've seen *significantly* better >> performance of >> streaming benchmarks of IPoIB with connected mode vs datagram >> mode on IB. >> -Aaron >> On 5/12/17 10:43 AM, Jonathon A Anderson wrote: >> > This won?t tell you which to use; but datagram mode and >> connected mode in IB is roughly analogous to UDB vs TCP in IP. One is >> ?unreliable? in that there?s no checking/retry built into the >> protocol; the other is ?reliable? and detects whether data is received >> completely and in the correct order. >> > >> > The last advice I heard for traditional IB was that the >> overhead of connected mode isn?t worth it, particularly if you?re >> using IPoIB (where you?re likely to be using TCP anyway). That said, >> on our OPA network we?re seeing the opposite advice; so I, to, am >> often unsure what the most correct configuration would be for >> any given fabric. >> > >> > ~jonathon >> > >> > >> > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org >> on behalf of Damir Krstic" > on behalf of damir.krstic at gmail.com> wrote: >> > >> > I never fully understood the difference between connected >> v. datagram mode beside the obvious packet size difference. Our NSD >> servers (ESS GL6 nodes) are installed with RedHat 7 and are in >> connected mode. Our 700+ clients are running RH6 and >> > are in datagram mode. >> > >> > >> > In a month we are upgrading our cluster to RedHat 7 and are >> debating whether to leave the compute nodes in datagram mode or >> whether to switch them to connected mode. >> > What is is the right thing to do? >> > >> > >> > Thanks in advance. >> > Damir >> > >> > >> > >> > _______________________________________________ >> > gpfsug-discuss mailing list >> > gpfsug-discuss at >> spectrumscale.org >> > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at >> spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From varun.mittal at in.ibm.com Mon May 15 19:39:28 2017 From: varun.mittal at in.ibm.com (Varun Mittal3) Date: Tue, 16 May 2017 00:09:28 +0530 Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration In-Reply-To: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> References: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> Message-ID: Hi Christian Is the data on existing cluster being accesses only through SMB or is it shared with NFS users having same UID/GIDs ? What mechanism would you be using to migrate the data ? I mean, if it's pure smb and the data migration would also be over smb share only (using tool like robocopy), you need not have the same IDs on both the source and the target system. Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune From: "Fey, Christian" To: gpfsug main discussion list Date: 11/05/2017 09:07 PM Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The old ctdb cluster uses rid as idmap backend. Since CES officially supports only autorid, I tried to choose the right values for the idmap ranges / sizes to get the same IDs but was not successful with this. The old samba has a range assigned for their Active directory (idmap config XYZ : range = 1000000-1999999) My idea was to set autorid to the following during mmuserauth create: idmap config * : backend = autorid idmap config * : range = 200000-2999999 idmap config * : rangesize = 800000 With that it should use the first range for the builtin range and the second should then start with 1000000 like in the old rid config. Sadly, the range of the domain is the third one: /usr/lpp/mmfs/bin/net idmap get ranges RANGE 0: ALLOC RANGE 1: S-1-5-32 RANGE 2: S-1-5-21-123456789-123456789-123456789 Does anyone have an idea how to fix this, maybe in a supported way and without storing the IDs in the domain? Further on, does anyone use rid as backend, even if not officially supported? Maybe we could file a RPQ or sth. Like this. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 [attachment "smime.p7s" deleted by Varun Mittal3/India/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From p.childs at qmul.ac.uk Tue May 16 10:40:09 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 16 May 2017 10:40:09 +0100 Subject: [gpfsug-discuss] AFM Prefetch Missing Files Message-ID: I know it was said at the User group meeting last week that older versions of afm prefetch miss empty files and that this is now fixed in 4.2.2.3. We are in the middle of trying to migrate our files to a new filesystem, and since that was said I'm double checking for any mistakes etc. Anyway it looks like AFM prefetch also misses symlinks pointing to files that that don't exist. ie "dangling symlinks" or ones that point to files that either have not been created yet or have subsequently been deleted. or when files have been decompressed and a symlink extracted that points somewhere that is never going to exist. I'm still checking this, and as yet it does not look like its a data loss issue, but it could still cause things to not quiet work once the file migration is complete. Does anyone else know of any other types of files that might be missed and I need to be aware of? We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" using a gpfs policy to collect the list, we are using GPFS Multi-cluster to connect the two filesystems not NFS.... Thanks in advanced Peter Childs From service at metamodul.com Tue May 16 20:17:55 2017 From: service at metamodul.com (Hans-Joachim Ehlers) Date: Tue, 16 May 2017 21:17:55 +0200 (CEST) Subject: [gpfsug-discuss] Maximum network delay for a Quorum Buster node Message-ID: <1486746025.249506.1494962275357@email.1und1.de> An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Wed May 17 12:26:44 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Wed, 17 May 2017 11:26:44 +0000 Subject: [gpfsug-discuss] Introduction Message-ID: Hi All, I help to run a gpfs cluster at the Met Office, Exeter, UK. The cluster is running GPFS 4.2.2.2, it's used with slurm for batch work - primarily for postprocessing weather and climate change model data generated from our HPC. We currently have 8 NSD nodes with approx 3PB of storage with 70+ client nodes. Kind Regards Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Wed May 17 12:44:01 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Wed, 17 May 2017 11:44:01 +0000 Subject: [gpfsug-discuss] GPFS GUI Message-ID: Hello all, Does anyone have any experience with troubleshooting the new GPFS GUI? I've got it up and running but have a few weird problems with it... Maybe someone can help or point me in the right direction? 1. It keeps generating an alert saying that the cluster is down, when it isn't?? Event name: gui_cluster_down Component: GUI Entity type: Node Entity name: Event time: 17/05/2017 12:19:29 Message: The GUI detected that the cluster is down. Description: The GUI checks the cluster state. Cause: The GUI calculated that an insufficient amount of quorum nodes is up and running. User action: Check why the cluster lost quorum. Reporting node: Event type: Active health state of an entity which is monitored by the system. 2. It is collecting sensor data from the NSD nodes without any issue, but it won't collect sensor data from any of the client nodes? I have the pmsensors package installed on all the nodes in question , the service is enabled and running - the logs showing that it has connected to the collector. However in the GUI it just says "Performance collector did not return any data" 3. The NSD nodes are returning performance data, but are all displaying a state of unknown. Would be great if anyone has any experience or ideas on how to troubleshoot this! Thanks Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed May 17 12:58:15 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Wed, 17 May 2017 07:58:15 -0400 Subject: [gpfsug-discuss] GPFS GUI In-Reply-To: References: Message-ID: <0E0FB0DB-BC4B-40F7-AD49-B365EA57EE65@brown.edu> I have issues as well with the gui. The issue that I had most similar to yours came about because I had installed the collector RPM and enabled collectors on two server nodes, but the GUI was only getting data from one of them. Each client randomly selected a collector to deliver data to. So how are multiple collectors supposed to work? Active/Passive? Failover pairs? Shared storage? Better not be on GPFS? Maybe there is a place in the gui config to tell it to keep track of multiple collectors, but I gave up looking and turned of the second collector service and removed it from the candidates. Other issue I mentioned before is that it is totally confused about how many nodes are in the cluster (thinks 21, with 3 unhealthy) when there are only 12 nodes in all, all healthy. The nodes dashboard never finishes loading, and no means of digging deeper (text based info) to find out why it is wedged. ? ddj > On May 17, 2017, at 7:44 AM, Wilson, Neil wrote: > > Hello all, > > Does anyone have any experience with troubleshooting the new GPFS GUI? > I?ve got it up and running but have a few weird problems with it... > Maybe someone can help or point me in the right direction? > > 1. It keeps generating an alert saying that the cluster is down, when it isn?t?? > > Event name: > gui_cluster_down > Component: > GUI > Entity type: > Node > Entity name: > Event time: > 17/05/2017 12:19:29 > Message: > The GUI detected that the cluster is down. > Description: > The GUI checks the cluster state. > Cause: > The GUI calculated that an insufficient amount of quorum nodes is up and running. > User action: > Check why the cluster lost quorum. > Reporting node: > Event type: > Active health state of an entity which is monitored by the system. > > 2. It is collecting sensor data from the NSD nodes without any issue, but it won?t collect sensor data from any of the client nodes? > I have the pmsensors package installed on all the nodes in question , the service is enabled and running ? the logs showing that it has connected to the collector. > However in the GUI it just says ?Performance collector did not return any data? > > 3. The NSD nodes are returning performance data, but are all displaying a state of unknown. > > > Would be great if anyone has any experience or ideas on how to troubleshoot this! > > Thanks > Neil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed May 17 13:23:48 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 17 May 2017 12:23:48 +0000 Subject: [gpfsug-discuss] GPFS GUI Message-ID: I don?t run the GUI in production, so I can?t comment on those issues specifically. I have been running a federated collector cluster for some time and it?s been working as expected. I?ve been using the Zimon-Grafana bridge code to look at GPFS performance stats. The other part of this is the mmhealth/mmsysmonitor process that reports events. It?s been problematic for me, especially in larger clusters (400+ nodes). The mmsysmonitor process is overloading the master node (the cluster manager) with too many ?heartbeats? and ends up causing lots of issues and log messages. Evidently this is something IBM is aware of (at the 4.2.2-2 level) and they have fixes coming out in 4.2.3 PTF1. I ended up disabling the cluster wide collection of health stats to prevent the cluster manager issues. However, be aware that CES depends on the mmhealth data so tinkering with the config make cause other issues if you use CES. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "David D. Johnson" Reply-To: gpfsug main discussion list Date: Wednesday, May 17, 2017 at 6:58 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS GUI So how are multiple collectors supposed to work? Active/Passive? Failover pairs? Shared storage? Better not be on GPFS? Maybe there is a place in the gui config to tell it to keep track of multiple collectors, but I gave up looking and turned of the second collector service and removed it from the candidates. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Wed May 17 17:00:12 2017 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Wed, 17 May 2017 18:00:12 +0200 Subject: [gpfsug-discuss] GPFS GUI In-Reply-To: <0E0FB0DB-BC4B-40F7-AD49-B365EA57EE65@brown.edu> References: <0E0FB0DB-BC4B-40F7-AD49-B365EA57EE65@brown.edu> Message-ID: Hello all, if multiple collectors should work together in a federation, the collector peers need to he specified in the ZimonCollectors.cfg. The GUI will see data from all collectors if federation is set up. See documentation below in the KC (works in 4.2.2 and 4.2.3 alike): https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_federation.htm For the issue related to the nodes count, can you contact me per PN? Mit freundlichen Gr??en / Kind regards Markus Rohwedder IBM Spectrum Scale GUI Development From: "David D. Johnson" To: gpfsug main discussion list Date: 17.05.2017 13:59 Subject: Re: [gpfsug-discuss] GPFS GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org I have issues as well with the gui. The issue that I had most similar to yours came about because I had installed the collector RPM and enabled collectors on two server nodes, but the GUI was only getting data from one of them. Each client randomly selected a collector to deliver data to. So how are multiple collectors supposed to work? Active/Passive? Failover pairs? Shared storage? Better not be on GPFS? Maybe there is a place in the gui config to tell it to keep track of multiple collectors, but I gave up looking and turned of the second collector service and removed it from the candidates. Other issue I mentioned before is that it is totally confused about how many nodes are in the cluster (thinks 21, with 3 unhealthy) when there are only 12 nodes in all, all healthy. The nodes dashboard never finishes loading, and no means of digging deeper (text based info) to find out why it is wedged. ? ddj On May 17, 2017, at 7:44 AM, Wilson, Neil < neil.wilson at metoffice.gov.uk> wrote: Hello all, Does anyone have any experience with troubleshooting the new GPFS GUI? I?ve got it up and running but have a few weird problems with it... Maybe someone can help or point me in the right direction? 1. It keeps generating an alert saying that the cluster is down, when it isn?t?? Event name: gui_cluster_down Component: GUI Entity type: Node Entity name: Event time: 17/05/2017 12:19:29 Message: The GUI detected that the cluster is down. Description: The GUI checks the cluster state. Cause: The GUI calculated that an insufficient amount of quorum nodes is up and running. User action: Check why the cluster lost quorum. Reporting node: Event type: Active health state of an entity which is monitored by the system. 2. It is collecting sensor data from the NSD nodes without any issue, but it won?t collect sensor data from any of the client nodes? I have the pmsensors package installed on all the nodes in question , the service is enabled and running ? the logs showing that it has connected to the collector. However in the GUI it just says ?Performance collector did not return any data? 3. The NSD nodes are returning performance data, but are all displaying a state of unknown. Would be great if anyone has any experience or ideas on how to troubleshoot this! Thanks Neil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From carlz at us.ibm.com Wed May 17 17:11:40 2017 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 17 May 2017 16:11:40 +0000 Subject: [gpfsug-discuss] Brief survey on GPFS / Scale usage from Scale Development Message-ID: An HTML attachment was scrubbed... URL: From Christian.Fey at sva.de Wed May 17 20:09:42 2017 From: Christian.Fey at sva.de (Fey, Christian) Date: Wed, 17 May 2017 19:09:42 +0000 Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration In-Reply-To: References: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> Message-ID: <310fef91208741b0b8059e805077f40e@sva.de> Hi, we have an existing filesystem and want to move from homebrew Samba/CTDB to CES. Since there is a lot of data in it, relabeling / migrating is not an option. FS stays the same, only nodes that share the FS change. There is an option to change the range (delete the existing ranges, set the new ones) with "net idmap set range" but in my Lab setup I was not successful in changing it. --cut-- [root at gpfs4n1 src]# /usr/lpp/mmfs/bin/net idmap set range 0 S-1-5-21-123456789-... Failed to save domain mapping: NT_STATUS_INVALID_PARAMETER --cut-- Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 Mobil: +49 151 180 251 39 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Varun Mittal3 Gesendet: Montag, 15. Mai 2017 20:39 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Hi Christian Is the data on existing cluster being accesses only through SMB or is it shared with NFS users having same UID/GIDs ? What mechanism would you be using to migrate the data ? I mean, if it's pure smb and the data migration would also be over smb share only (using tool like robocopy), you need not have the same IDs on both the source and the target system. Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for "Fey, Christian" ---11/05/2017 09:07:57 PM---Hi all, I'm just doing a migration from a gpfs cluster w]"Fey, Christian" ---11/05/2017 09:07:57 PM---Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The ol From: "Fey, Christian" > To: gpfsug main discussion list > Date: 11/05/2017 09:07 PM Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The old ctdb cluster uses rid as idmap backend. Since CES officially supports only autorid, I tried to choose the right values for the idmap ranges / sizes to get the same IDs but was not successful with this. The old samba has a range assigned for their Active directory (idmap config XYZ : range = 1000000-1999999) My idea was to set autorid to the following during mmuserauth create: idmap config * : backend = autorid idmap config * : range = 200000-2999999 idmap config * : rangesize = 800000 With that it should use the first range for the builtin range and the second should then start with 1000000 like in the old rid config. Sadly, the range of the domain is the third one: /usr/lpp/mmfs/bin/net idmap get ranges RANGE 0: ALLOC RANGE 1: S-1-5-32 RANGE 2: S-1-5-21-123456789-123456789-123456789 Does anyone have an idea how to fix this, maybe in a supported way and without storing the IDs in the domain? Further on, does anyone use rid as backend, even if not officially supported? Maybe we could file a RPQ or sth. Like this. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 [attachment "smime.p7s" deleted by Varun Mittal3/India/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5467 bytes Desc: not available URL: From Christian.Fey at sva.de Wed May 17 20:37:36 2017 From: Christian.Fey at sva.de (Fey, Christian) Date: Wed, 17 May 2017 19:37:36 +0000 Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration References: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> Message-ID: <38b79da90bfc4c549c5971f06cfaf5e5@sva.de> I just got the information that there is a debugging switch for the "net" commands (-d10). Looks like the issue with setting the ranges is caused by my lab setup (complains that the ranges are still present). I will try again with a scratched config and report back. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 Mobil: +49 151 180 251 39 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 Von: Fey, Christian Gesendet: Mittwoch, 17. Mai 2017 21:10 An: gpfsug main discussion list Betreff: AW: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Hi, we have an existing filesystem and want to move from homebrew Samba/CTDB to CES. Since there is a lot of data in it, relabeling / migrating is not an option. FS stays the same, only nodes that share the FS change. There is an option to change the range (delete the existing ranges, set the new ones) with "net idmap set range" but in my Lab setup I was not successful in changing it. --cut-- [root at gpfs4n1 src]# /usr/lpp/mmfs/bin/net idmap set range 0 S-1-5-21-123456789-... Failed to save domain mapping: NT_STATUS_INVALID_PARAMETER --cut-- Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 Mobil: +49 151 180 251 39 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Varun Mittal3 Gesendet: Montag, 15. Mai 2017 20:39 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Hi Christian Is the data on existing cluster being accesses only through SMB or is it shared with NFS users having same UID/GIDs ? What mechanism would you be using to migrate the data ? I mean, if it's pure smb and the data migration would also be over smb share only (using tool like robocopy), you need not have the same IDs on both the source and the target system. Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for "Fey, Christian" ---11/05/2017 09:07:57 PM---Hi all, I'm just doing a migration from a gpfs cluster w]"Fey, Christian" ---11/05/2017 09:07:57 PM---Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The ol From: "Fey, Christian" > To: gpfsug main discussion list > Date: 11/05/2017 09:07 PM Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The old ctdb cluster uses rid as idmap backend. Since CES officially supports only autorid, I tried to choose the right values for the idmap ranges / sizes to get the same IDs but was not successful with this. The old samba has a range assigned for their Active directory (idmap config XYZ : range = 1000000-1999999) My idea was to set autorid to the following during mmuserauth create: idmap config * : backend = autorid idmap config * : range = 200000-2999999 idmap config * : rangesize = 800000 With that it should use the first range for the builtin range and the second should then start with 1000000 like in the old rid config. Sadly, the range of the domain is the third one: /usr/lpp/mmfs/bin/net idmap get ranges RANGE 0: ALLOC RANGE 1: S-1-5-32 RANGE 2: S-1-5-21-123456789-123456789-123456789 Does anyone have an idea how to fix this, maybe in a supported way and without storing the IDs in the domain? Further on, does anyone use rid as backend, even if not officially supported? Maybe we could file a RPQ or sth. Like this. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 [attachment "smime.p7s" deleted by Varun Mittal3/India/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5467 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed May 17 21:44:47 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 17 May 2017 16:44:47 -0400 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors Message-ID: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: * project3 * scratch3 * sysadmin3 I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we have no need or space to include *scratch3* on TSM. Question: how to craft the mmbackup command to backup /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? Below are 3 types of errors: 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up dependent fileset sysadmin3 is not supported Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope filesystem --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. These examples don't really cover my case: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples Thanks Jaime ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From luis.bolinches at fi.ibm.com Wed May 17 21:49:35 2017 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 17 May 2017 23:49:35 +0300 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> References: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> Message-ID: Hi have you tried to add exceptions on the TSM client config file? Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked on /IBM/GPFS/FSET1 dsm.sys ... DOMAIN /IBM/GPFS EXCLUDE.DIR /IBM/GPFS/FSET1 From: "Jaime Pinto" To: "gpfsug main discussion list" Date: 17-05-17 23:44 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: * project3 * scratch3 * sysadmin3 I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we have no need or space to include *scratch3* on TSM. Question: how to craft the mmbackup command to backup /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? Below are 3 types of errors: 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up dependent fileset sysadmin3 is not supported Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope filesystem --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. These examples don't really cover my case: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples Thanks Jaime ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu May 18 00:48:58 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 17 May 2017 19:48:58 -0400 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: References: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> Message-ID: <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Quoting "Luis Bolinches" : > Hi > > have you tried to add exceptions on the TSM client config file? Hey Luis, That would work as well (mechanically), however it's not elegant or efficient. When you have over 1PB and 200M files on scratch it will take many hours and several helper nodes to traverse that fileset just to be negated by TSM. In fact exclusion on TSM are just as inefficient. Considering that I want to keep project and sysadmin on different domains then it's much worst, since we have to traverse and exclude scratch & (project|sysadmin) twice, once to capture sysadmin and again to capture project. If I have to use exclusion rules it has to rely sole on gpfs rules, and somehow not traverse scratch at all. I suspect there is a way to do this properly, however the examples on the gpfs guide and other references are not exhaustive. They only show a couple of trivial cases. However my situation is not unique. I suspect there are may facilities having to deal with backup of HUGE filesets. So the search is on. Thanks Jaime > > Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked > on /IBM/GPFS/FSET1 > > dsm.sys > ... > > DOMAIN /IBM/GPFS > EXCLUDE.DIR /IBM/GPFS/FSET1 > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > Date: 17-05-17 23:44 > Subject: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: > * project3 > * scratch3 > * sysadmin3 > > I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we > have no need or space to include *scratch3* on TSM. > > Question: how to craft the mmbackup command to backup > /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? > > Below are 3 types of errors: > > 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > > ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up > dependent fileset sysadmin3 is not supported > Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > > 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope filesystem --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > These examples don't really cover my case: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > > > Thanks > Jaime > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From pinto at scinet.utoronto.ca Thu May 18 02:43:29 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 17 May 2017 21:43:29 -0400 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> References: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Message-ID: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 -------------------------------------------------------- mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. -------------------------------------------------------- Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 -------------------------------------------------------- Will post the outcome. Jaime Quoting "Jaime Pinto" : > Quoting "Luis Bolinches" : > >> Hi >> >> have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > >> >> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked >> on /IBM/GPFS/FSET1 >> >> dsm.sys >> ... >> >> DOMAIN /IBM/GPFS >> EXCLUDE.DIR /IBM/GPFS/FSET1 >> >> >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" >> Date: 17-05-17 23:44 >> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >> * project3 >> * scratch3 >> * sysadmin3 >> >> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >> have no need or space to include *scratch3* on TSM. >> >> Question: how to craft the mmbackup command to backup >> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >> >> Below are 3 types of errors: >> >> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --tsm-errorlog $logfile -L 2 >> >> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >> cannot be specified at the same time. >> >> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> >> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >> dependent fileset sysadmin3 is not supported >> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> >> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope filesystem --tsm-errorlog $logfile -L 2 >> >> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >> cannot be specified at the same time. >> >> These examples don't really cover my case: >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >> >> >> Thanks >> Jaime >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From dominic.mueller at de.ibm.com Thu May 18 07:09:31 2017 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Thu, 18 May 2017 06:09:31 +0000 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu May 18 07:09:33 2017 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 18 May 2017 06:09:33 +0000 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Message-ID: An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Thu May 18 10:08:20 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 May 2017 10:08:20 +0100 Subject: [gpfsug-discuss] AFM Prefetch Missing Files In-Reply-To: References: Message-ID: Further investigation and checking says 4.2.1 afmctl prefetch is missing empty directories (not files as said previously) and noted by the update in 4.2.2.3. However I've found it is also missing symlinks both dangling (pointing to files that don't exist) and not. I can't see any actual data loss which is good. I'm looking to work around this with find /data2/$fileset -noleaf \( \( -type d -empty \) -o \( -type l \) \) -printf "%p -> %l\n" My initial testing says this should work. (/data2/$fileset is the destination "cache" fileset) It looks like this should catch everything, But I'm wondering if anyone else has noticed any other things afmctl prefetch misses. Thanks in advance Peter Childs On 16/05/17 10:40, Peter Childs wrote: > I know it was said at the User group meeting last week that older > versions of afm prefetch miss empty files and that this is now fixed > in 4.2.2.3. > > We are in the middle of trying to migrate our files to a new > filesystem, and since that was said I'm double checking for any > mistakes etc. > > Anyway it looks like AFM prefetch also misses symlinks pointing to > files that that don't exist. ie "dangling symlinks" or ones that point > to files that either have not been created yet or have subsequently > been deleted. or when files have been decompressed and a symlink > extracted that points somewhere that is never going to exist. > > I'm still checking this, and as yet it does not look like its a data > loss issue, but it could still cause things to not quiet work once the > file migration is complete. > > Does anyone else know of any other types of files that might be missed > and I need to be aware of? > > We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" > using a gpfs policy to collect the list, we are using GPFS > Multi-cluster to connect the two filesystems not NFS.... > > Thanks in advanced > > > Peter Childs > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From neil.wilson at metoffice.gov.uk Thu May 18 10:24:53 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Thu, 18 May 2017 09:24:53 +0000 Subject: [gpfsug-discuss] AFM Prefetch Missing Files In-Reply-To: References: Message-ID: We recently migrated several hundred TB from an Isilon cluster to our GPFS cluster using AFM using NFS gateways mostly using 4.2.2.2 , the main thing we noticed was that it would not migrate empty directories - we worked around the issue by getting a list of the missing directories and running it through a simple script that cd's into each directory then lists the empty directory. I didn't come across any issues with symlinks not being prefetched, just the directories. Regards Neil Wilson -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter Childs Sent: 18 May 2017 10:08 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] AFM Prefetch Missing Files Further investigation and checking says 4.2.1 afmctl prefetch is missing empty directories (not files as said previously) and noted by the update in 4.2.2.3. However I've found it is also missing symlinks both dangling (pointing to files that don't exist) and not. I can't see any actual data loss which is good. I'm looking to work around this with find /data2/$fileset -noleaf \( \( -type d -empty \) -o \( -type l \) \) -printf "%p -> %l\n" My initial testing says this should work. (/data2/$fileset is the destination "cache" fileset) It looks like this should catch everything, But I'm wondering if anyone else has noticed any other things afmctl prefetch misses. Thanks in advance Peter Childs On 16/05/17 10:40, Peter Childs wrote: > I know it was said at the User group meeting last week that older > versions of afm prefetch miss empty files and that this is now fixed > in 4.2.2.3. > > We are in the middle of trying to migrate our files to a new > filesystem, and since that was said I'm double checking for any > mistakes etc. > > Anyway it looks like AFM prefetch also misses symlinks pointing to > files that that don't exist. ie "dangling symlinks" or ones that point > to files that either have not been created yet or have subsequently > been deleted. or when files have been decompressed and a symlink > extracted that points somewhere that is never going to exist. > > I'm still checking this, and as yet it does not look like its a data > loss issue, but it could still cause things to not quiet work once the > file migration is complete. > > Does anyone else know of any other types of files that might be missed > and I need to be aware of? > > We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" > using a gpfs policy to collect the list, we are using GPFS > Multi-cluster to connect the two filesystems not NFS.... > > Thanks in advanced > > > Peter Childs > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Thu May 18 14:33:29 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 May 2017 09:33:29 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Message-ID: When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset.... If you didn't say otherwise, inodes come from the default "root" fileset.... Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned.... BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers" -- (very technical hint) you may need to permute the bit order before you discern a simple pattern... From: "Luis Bolinches" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 05/18/2017 02:10 AM Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi There is no direct way to convert the one fileset that is dependent to independent or viceversa. I would suggest to take a look to chapter 5 of the 2014 redbook, lots of definitions about GPFS ILM including filesets http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only place that is explained but I honestly believe is a good single start point. It also needs an update as does nto have anything on CES nor ESS, so anyone in this list feel free to give feedback on that page people with funding decisions listen there. So you are limited to either migrate the data from that fileset to a new independent fileset (multiple ways to do that) or use the TSM client config. ----- Original message ----- From: "Jaime Pinto" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug main discussion list" , "Jaime Pinto" Cc: Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Date: Thu, May 18, 2017 4:43 AM There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 -------------------------------------------------------- mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. -------------------------------------------------------- Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 -------------------------------------------------------- Will post the outcome. Jaime Quoting "Jaime Pinto" : > Quoting "Luis Bolinches" : > >> Hi >> >> have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > >> >> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked >> on /IBM/GPFS/FSET1 >> >> dsm.sys >> ... >> >> DOMAIN /IBM/GPFS >> EXCLUDE.DIR /IBM/GPFS/FSET1 >> >> >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" >> Date: 17-05-17 23:44 >> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >> * project3 >> * scratch3 >> * sysadmin3 >> >> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >> have no need or space to include *scratch3* on TSM. >> >> Question: how to craft the mmbackup command to backup >> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >> >> Below are 3 types of errors: >> >> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --tsm-errorlog $logfile -L 2 >> >> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >> cannot be specified at the same time. >> >> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> >> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >> dependent fileset sysadmin3 is not supported >> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> >> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope filesystem --tsm-errorlog $logfile -L 2 >> >> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >> cannot be specified at the same time. >> >> These examples don't really cover my case: >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >> >> >> Thanks >> Jaime >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu May 18 14:58:51 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 18 May 2017 09:58:51 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Message-ID: <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that are > in a separable range of inode numbers - this allows GPFS to efficiently do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset.... If you didn't say otherwise, inodes come from the > default "root" fileset.... > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned.... > > BTW - if you look at the bits of the inode numbers carefully --- you may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor ESS, > so anyone in this list feel free to give feedback on that page people with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a new > independent fileset (multiple ways to do that) or use the TSM client > config. > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Jaime Pinto" > Cc: > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Date: Thu, May 18, 2017 4:43 AM > > There is hope. See reference link below: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > > > The issue has to do with dependent vs. independent filesets, something > I didn't even realize existed until now. Our filesets are dependent > (for no particular reason), so I have to find a way to turn them into > independent. > > The proper option syntax is "--scope inodespace", and the error > message actually flagged that out, however I didn't know how to > interpret what I saw: > > > # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > -------------------------------------------------------- > mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 > 21:27:43 EDT 2017. > -------------------------------------------------------- > Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* > fileset sysadmin3 is not supported > Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > -------------------------------------------------------- > > Will post the outcome. > Jaime > > > > Quoting "Jaime Pinto" : > >> Quoting "Luis Bolinches" : >> >>> Hi >>> >>> have you tried to add exceptions on the TSM client config file? >> >> Hey Luis, >> >> That would work as well (mechanically), however it's not elegant or >> efficient. When you have over 1PB and 200M files on scratch it will >> take many hours and several helper nodes to traverse that fileset just >> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >> Considering that I want to keep project and sysadmin on different >> domains then it's much worst, since we have to traverse and exclude >> scratch & (project|sysadmin) twice, once to capture sysadmin and again >> to capture project. >> >> If I have to use exclusion rules it has to rely sole on gpfs rules, and >> somehow not traverse scratch at all. >> >> I suspect there is a way to do this properly, however the examples on >> the gpfs guide and other references are not exhaustive. They only show >> a couple of trivial cases. >> >> However my situation is not unique. I suspect there are may facilities >> having to deal with backup of HUGE filesets. >> >> So the search is on. >> >> Thanks >> Jaime >> >> >> >> >>> >>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is > linked >>> on /IBM/GPFS/FSET1 >>> >>> dsm.sys >>> ... >>> >>> DOMAIN /IBM/GPFS >>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>> >>> >>> From: "Jaime Pinto" >>> To: "gpfsug main discussion list" > >>> Date: 17-05-17 23:44 >>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>> * project3 >>> * scratch3 >>> * sysadmin3 >>> >>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>> have no need or space to include *scratch3* on TSM. >>> >>> Question: how to craft the mmbackup command to backup >>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>> >>> Below are 3 types of errors: >>> >>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --tsm-errorlog $logfile -L 2 >>> >>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>> cannot be specified at the same time. >>> >>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope inodespace --tsm-errorlog $logfile -L 2 >>> >>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>> dependent fileset sysadmin3 is not supported >>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>> fileset level backup. exit 1 >>> >>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope filesystem --tsm-errorlog $logfile -L 2 >>> >>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>> cannot be specified at the same time. >>> >>> These examples don't really cover my case: >>> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >>> >>> >>> Thanks >>> Jaime >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>> Oy IBM Finland Ab >>> PL 265, 00101 Helsinki, Finland >>> Business ID, Y-tunnus: 0195876-3 >>> Registered in Finland >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of > Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From p.childs at qmul.ac.uk Thu May 18 15:12:05 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 May 2017 15:12:05 +0100 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> Message-ID: <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> As I understand it, mmbackup calls mmapplypolicy so this stands for mmapplypolicy too..... mmapplypolicy scans the metadata inodes (file) as requested depending on the query supplied. You can ask mmapplypolicy to scan a fileset, inode space or filesystem. If scanning a fileset it scans the inode space that fileset is dependant on, for all files in that fileset. Smaller inode spaces hence less to scan, hence its faster to use an independent filesets, you get a list of what to process quicker. Another advantage is that once an inode is allocated you can't deallocate it, however you can delete independent filesets and hence deallocate the inodes, so if you have a task which has losts and lots of small files which are only needed for a short period of time, you can create a new independent fileset for them work on them and then blow them away afterwards. I like independent filesets I'm guessing the only reason dependant filesets are used by default is history..... Peter On 18/05/17 14:58, Jaime Pinto wrote: > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > >> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always >> think >> and try to read that as "inode space". >> >> An "independent fileset" has all the attributes of an (older-fashioned) >> dependent fileset PLUS all of its files are represented by inodes >> that are >> in a separable range of inode numbers - this allows GPFS to >> efficiently do >> snapshots of just that inode-space (uh... independent fileset)... >> >> And... of course the files of dependent filesets must also be >> represented >> by inodes -- those inode numbers are within the inode-space of whatever >> the containing independent fileset is... as was chosen when you created >> the fileset.... If you didn't say otherwise, inodes come from the >> default "root" fileset.... >> >> Clear as your bath-water, no? >> >> So why does mmbackup care one way or another ??? Stay tuned.... >> >> BTW - if you look at the bits of the inode numbers carefully --- you may >> not immediately discern what I mean by a "separable range of inode >> numbers" -- (very technical hint) you may need to permute the bit order >> before you discern a simple pattern... >> >> >> >> From: "Luis Bolinches" >> To: gpfsug-discuss at spectrumscale.org >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 05/18/2017 02:10 AM >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope >> errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi >> >> There is no direct way to convert the one fileset that is dependent to >> independent or viceversa. >> >> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >> definitions about GPFS ILM including filesets >> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >> place that is explained but I honestly believe is a good single start >> point. It also needs an update as does nto have anything on CES nor ESS, >> so anyone in this list feel free to give feedback on that page people >> with >> funding decisions listen there. >> >> So you are limited to either migrate the data from that fileset to a new >> independent fileset (multiple ways to do that) or use the TSM client >> config. >> >> ----- Original message ----- >> From: "Jaime Pinto" >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: "gpfsug main discussion list" , >> "Jaime Pinto" >> Cc: >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >> Date: Thu, May 18, 2017 4:43 AM >> >> There is hope. See reference link below: >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm >> >> >> >> The issue has to do with dependent vs. independent filesets, something >> I didn't even realize existed until now. Our filesets are dependent >> (for no particular reason), so I have to find a way to turn them into >> independent. >> >> The proper option syntax is "--scope inodespace", and the error >> message actually flagged that out, however I didn't know how to >> interpret what I saw: >> >> >> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> -------------------------------------------------------- >> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >> 21:27:43 EDT 2017. >> -------------------------------------------------------- >> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >> fileset sysadmin3 is not supported >> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> -------------------------------------------------------- >> >> Will post the outcome. >> Jaime >> >> >> >> Quoting "Jaime Pinto" : >> >>> Quoting "Luis Bolinches" : >>> >>>> Hi >>>> >>>> have you tried to add exceptions on the TSM client config file? >>> >>> Hey Luis, >>> >>> That would work as well (mechanically), however it's not elegant or >>> efficient. When you have over 1PB and 200M files on scratch it will >>> take many hours and several helper nodes to traverse that fileset just >>> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >>> Considering that I want to keep project and sysadmin on different >>> domains then it's much worst, since we have to traverse and exclude >>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>> to capture project. >>> >>> If I have to use exclusion rules it has to rely sole on gpfs rules, and >>> somehow not traverse scratch at all. >>> >>> I suspect there is a way to do this properly, however the examples on >>> the gpfs guide and other references are not exhaustive. They only show >>> a couple of trivial cases. >>> >>> However my situation is not unique. I suspect there are may facilities >>> having to deal with backup of HUGE filesets. >>> >>> So the search is on. >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>>> >>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >> linked >>>> on /IBM/GPFS/FSET1 >>>> >>>> dsm.sys >>>> ... >>>> >>>> DOMAIN /IBM/GPFS >>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>> >>>> >>>> From: "Jaime Pinto" >>>> To: "gpfsug main discussion list" >> >>>> Date: 17-05-17 23:44 >>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>> * project3 >>>> * scratch3 >>>> * sysadmin3 >>>> >>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>> have no need or space to include *scratch3* on TSM. >>>> >>>> Question: how to craft the mmbackup command to backup >>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>> >>>> Below are 3 types of errors: >>>> >>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>> dependent fileset sysadmin3 is not supported >>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>> fileset level backup. exit 1 >>>> >>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> These examples don't really cover my case: >>>> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >> >> >>>> >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>>> >>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>> Oy IBM Finland Ab >>>> PL 265, 00101 Helsinki, Finland >>>> Business ID, Y-tunnus: 0195876-3 >>>> Registered in Finland >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From a.g.richmond at leeds.ac.uk Thu May 18 15:22:55 2017 From: a.g.richmond at leeds.ac.uk (Aidan Richmond) Date: Thu, 18 May 2017 15:22:55 +0100 Subject: [gpfsug-discuss] Keytab error trying to join an active directory domain Message-ID: Hello I'm trying to join an AD domain for SMB and NFS protocol sharing but I keep getting a "Failed to generate the kerberos keytab file" error. The command I'm running is /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type ad --netbios-name @name@ --servers @adserver@ --user-name @username@ --idmap-role master --enable-nfs-kerberos --unixmap-domains "DS(1000-9999999)" A correct keytab does appears to be created on the host I run this on (one of two protocol nodes) but not on the other one. -- Aidan Richmond Apple/Unix Support Officer, IT Garstang 10.137 Faculty of Biological Sciences University of Leeds Clarendon Way LS2 9JT Tel:0113 3434252 a.g.richmond at leeds.ac.uk From makaplan at us.ibm.com Thu May 18 15:23:30 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 May 2017 10:23:30 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> Message-ID: Jaime, While we're waiting for the mmbackup expert to weigh in, notice that the mmbackup command does have a -P option that allows you to provide a customized policy rules file. So... a fairly safe hack is to do a trial mmbackup run, capture the automatically generated policy file, and then augment it with FOR FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup for real with your customized policy file. mmbackup uses mmapplypolicy which by itself is happy to limit its directory scan to a particular fileset by using mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope fileset .... However, mmbackup probably has other worries and for simpliciity and helping make sure you get complete, sensible backups, apparently has imposed some restrictions to preserve sanity (yours and our support team! ;-) ) ... (For example, suppose you were doing incremental backups, starting at different paths each time? -- happy to do so, but when disaster strikes and you want to restore -- you'll end up confused and/or unhappy!) "converting from one fileset to another" --- sorry there is no such thing. Filesets are kinda like little filesystems within filesystems. Moving a file from one fileset to another requires a copy operation. There is no fast move nor hardlinking. --marc From: "Jaime Pinto" To: "gpfsug main discussion list" , "Marc A Kaplan" Date: 05/18/2017 09:58 AM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that are > in a separable range of inode numbers - this allows GPFS to efficiently do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset.... If you didn't say otherwise, inodes come from the > default "root" fileset.... > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned.... > > BTW - if you look at the bits of the inode numbers carefully --- you may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor ESS, > so anyone in this list feel free to give feedback on that page people with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a new > independent fileset (multiple ways to do that) or use the TSM client > config. > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Jaime Pinto" > Cc: > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Date: Thu, May 18, 2017 4:43 AM > > There is hope. See reference link below: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > > > The issue has to do with dependent vs. independent filesets, something > I didn't even realize existed until now. Our filesets are dependent > (for no particular reason), so I have to find a way to turn them into > independent. > > The proper option syntax is "--scope inodespace", and the error > message actually flagged that out, however I didn't know how to > interpret what I saw: > > > # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > -------------------------------------------------------- > mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 > 21:27:43 EDT 2017. > -------------------------------------------------------- > Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* > fileset sysadmin3 is not supported > Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > -------------------------------------------------------- > > Will post the outcome. > Jaime > > > > Quoting "Jaime Pinto" : > >> Quoting "Luis Bolinches" : >> >>> Hi >>> >>> have you tried to add exceptions on the TSM client config file? >> >> Hey Luis, >> >> That would work as well (mechanically), however it's not elegant or >> efficient. When you have over 1PB and 200M files on scratch it will >> take many hours and several helper nodes to traverse that fileset just >> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >> Considering that I want to keep project and sysadmin on different >> domains then it's much worst, since we have to traverse and exclude >> scratch & (project|sysadmin) twice, once to capture sysadmin and again >> to capture project. >> >> If I have to use exclusion rules it has to rely sole on gpfs rules, and >> somehow not traverse scratch at all. >> >> I suspect there is a way to do this properly, however the examples on >> the gpfs guide and other references are not exhaustive. They only show >> a couple of trivial cases. >> >> However my situation is not unique. I suspect there are may facilities >> having to deal with backup of HUGE filesets. >> >> So the search is on. >> >> Thanks >> Jaime >> >> >> >> >>> >>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is > linked >>> on /IBM/GPFS/FSET1 >>> >>> dsm.sys >>> ... >>> >>> DOMAIN /IBM/GPFS >>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>> >>> >>> From: "Jaime Pinto" >>> To: "gpfsug main discussion list" > >>> Date: 17-05-17 23:44 >>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>> * project3 >>> * scratch3 >>> * sysadmin3 >>> >>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>> have no need or space to include *scratch3* on TSM. >>> >>> Question: how to craft the mmbackup command to backup >>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>> >>> Below are 3 types of errors: >>> >>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --tsm-errorlog $logfile -L 2 >>> >>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>> cannot be specified at the same time. >>> >>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope inodespace --tsm-errorlog $logfile -L 2 >>> >>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>> dependent fileset sysadmin3 is not supported >>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>> fileset level backup. exit 1 >>> >>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope filesystem --tsm-errorlog $logfile -L 2 >>> >>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>> cannot be specified at the same time. >>> >>> These examples don't really cover my case: >>> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >>> >>> >>> Thanks >>> Jaime >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>> Oy IBM Finland Ab >>> PL 265, 00101 Helsinki, Finland >>> Business ID, Y-tunnus: 0195876-3 >>> Registered in Finland >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of > Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu May 18 15:24:17 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Thu, 18 May 2017 10:24:17 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> Message-ID: <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> Here is one big reason independent filesets are problematic: A5.13: Table 43. Maximum number of filesets Version of GPFS Maximum Number of Dependent Filesets Maximum Number of Independent Filesets IBM Spectrum Scale V4 10,000 1,000 GPFS V3.5 10,000 1,000 Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. This is true of the root namespace as well, but there?s only one number to watch per filesystem. ? ddj Dave Johnson Brown University > On May 18, 2017, at 10:12 AM, Peter Childs wrote: > > As I understand it, > > mmbackup calls mmapplypolicy so this stands for mmapplypolicy too..... > > mmapplypolicy scans the metadata inodes (file) as requested depending on the query supplied. > > You can ask mmapplypolicy to scan a fileset, inode space or filesystem. > > If scanning a fileset it scans the inode space that fileset is dependant on, for all files in that fileset. Smaller inode spaces hence less to scan, hence its faster to use an independent filesets, you get a list of what to process quicker. > > Another advantage is that once an inode is allocated you can't deallocate it, however you can delete independent filesets and hence deallocate the inodes, so if you have a task which has losts and lots of small files which are only needed for a short period of time, you can create a new independent fileset for them work on them and then blow them away afterwards. > > I like independent filesets I'm guessing the only reason dependant filesets are used by default is history..... > > > Peter > > > On 18/05/17 14:58, Jaime Pinto wrote: >> Thanks for the explanation Mark and Luis, >> >> It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. >> >> When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. >> >> I'm looking forward to your explanation as to why mmbackup cares one way or another. >> >> I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? >> >> Thanks >> Jaime >> >> Quoting "Marc A Kaplan" : >> >>> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think >>> and try to read that as "inode space". >>> >>> An "independent fileset" has all the attributes of an (older-fashioned) >>> dependent fileset PLUS all of its files are represented by inodes that are >>> in a separable range of inode numbers - this allows GPFS to efficiently do >>> snapshots of just that inode-space (uh... independent fileset)... >>> >>> And... of course the files of dependent filesets must also be represented >>> by inodes -- those inode numbers are within the inode-space of whatever >>> the containing independent fileset is... as was chosen when you created >>> the fileset.... If you didn't say otherwise, inodes come from the >>> default "root" fileset.... >>> >>> Clear as your bath-water, no? >>> >>> So why does mmbackup care one way or another ??? Stay tuned.... >>> >>> BTW - if you look at the bits of the inode numbers carefully --- you may >>> not immediately discern what I mean by a "separable range of inode >>> numbers" -- (very technical hint) you may need to permute the bit order >>> before you discern a simple pattern... >>> >>> >>> >>> From: "Luis Bolinches" >>> To: gpfsug-discuss at spectrumscale.org >>> Cc: gpfsug-discuss at spectrumscale.org >>> Date: 05/18/2017 02:10 AM >>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> Hi >>> >>> There is no direct way to convert the one fileset that is dependent to >>> independent or viceversa. >>> >>> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >>> definitions about GPFS ILM including filesets >>> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >>> place that is explained but I honestly believe is a good single start >>> point. It also needs an update as does nto have anything on CES nor ESS, >>> so anyone in this list feel free to give feedback on that page people with >>> funding decisions listen there. >>> >>> So you are limited to either migrate the data from that fileset to a new >>> independent fileset (multiple ways to do that) or use the TSM client >>> config. >>> >>> ----- Original message ----- >>> From: "Jaime Pinto" >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> To: "gpfsug main discussion list" , >>> "Jaime Pinto" >>> Cc: >>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Date: Thu, May 18, 2017 4:43 AM >>> >>> There is hope. See reference link below: >>> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm >>> >>> >>> The issue has to do with dependent vs. independent filesets, something >>> I didn't even realize existed until now. Our filesets are dependent >>> (for no particular reason), so I have to find a way to turn them into >>> independent. >>> >>> The proper option syntax is "--scope inodespace", and the error >>> message actually flagged that out, however I didn't know how to >>> interpret what I saw: >>> >>> >>> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope inodespace --tsm-errorlog $logfile -L 2 >>> -------------------------------------------------------- >>> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >>> 21:27:43 EDT 2017. >>> -------------------------------------------------------- >>> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >>> fileset sysadmin3 is not supported >>> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >>> fileset level backup. exit 1 >>> -------------------------------------------------------- >>> >>> Will post the outcome. >>> Jaime >>> >>> >>> >>> Quoting "Jaime Pinto" : >>> >>>> Quoting "Luis Bolinches" : >>>> >>>>> Hi >>>>> >>>>> have you tried to add exceptions on the TSM client config file? >>>> >>>> Hey Luis, >>>> >>>> That would work as well (mechanically), however it's not elegant or >>>> efficient. When you have over 1PB and 200M files on scratch it will >>>> take many hours and several helper nodes to traverse that fileset just >>>> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >>>> Considering that I want to keep project and sysadmin on different >>>> domains then it's much worst, since we have to traverse and exclude >>>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>>> to capture project. >>>> >>>> If I have to use exclusion rules it has to rely sole on gpfs rules, and >>>> somehow not traverse scratch at all. >>>> >>>> I suspect there is a way to do this properly, however the examples on >>>> the gpfs guide and other references are not exhaustive. They only show >>>> a couple of trivial cases. >>>> >>>> However my situation is not unique. I suspect there are may facilities >>>> having to deal with backup of HUGE filesets. >>>> >>>> So the search is on. >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> >>>> >>>>> >>>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >>> linked >>>>> on /IBM/GPFS/FSET1 >>>>> >>>>> dsm.sys >>>>> ... >>>>> >>>>> DOMAIN /IBM/GPFS >>>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>>> >>>>> >>>>> From: "Jaime Pinto" >>>>> To: "gpfsug main discussion list" >>> >>>>> Date: 17-05-17 23:44 >>>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>>> >>>>> >>>>> >>>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>>> * project3 >>>>> * scratch3 >>>>> * sysadmin3 >>>>> >>>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>>> have no need or space to include *scratch3* on TSM. >>>>> >>>>> Question: how to craft the mmbackup command to backup >>>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>>> >>>>> Below are 3 types of errors: >>>>> >>>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>>> cannot be specified at the same time. >>>>> >>>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>>> dependent fileset sysadmin3 is not supported >>>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>>> fileset level backup. exit 1 >>>>> >>>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>>> cannot be specified at the same time. >>>>> >>>>> These examples don't really cover my case: >>>>> >>> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >>> >>>>> >>>>> >>>>> Thanks >>>>> Jaime >>>>> >>>>> >>>>> ************************************ >>>>> TELL US ABOUT YOUR SUCCESS STORIES >>>>> http://www.scinethpc.ca/testimonials >>>>> ************************************ >>>>> --- >>>>> Jaime Pinto >>>>> SciNet HPC Consortium - Compute/Calcul Canada >>>>> www.scinet.utoronto.ca - www.computecanada.ca >>>>> University of Toronto >>>>> 661 University Ave. (MaRS), Suite 1140 >>>>> Toronto, ON, M5G1M1 >>>>> P: 416-978-2755 >>>>> C: 416-505-1477 >>>>> >>>>> ---------------------------------------------------------------- >>>>> This message was sent using IMP at SciNet Consortium, University of >>>>> Toronto. >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>>> Oy IBM Finland Ab >>>>> PL 265, 00101 Helsinki, Finland >>>>> Business ID, Y-tunnus: 0195876-3 >>>>> Registered in Finland >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>> Oy IBM Finland Ab >>> PL 265, 00101 Helsinki, Finland >>> Business ID, Y-tunnus: 0195876-3 >>> Registered in Finland >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 18 15:32:42 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 18 May 2017 14:32:42 +0000 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> Message-ID: Thanks, I was just about to post that, and I guess is still the reason a dependent fileset is still the default without the ?inode-space new option fileset creation. I do wonder why there is a limit of 1000, whether it?s just IBM not envisaging any customer needing more than that? We?ve only got 414 at the moment but that will grow to over 500 this year. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of David D. Johnson Sent: 18 May 2017 15:24 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Here is one big reason independent filesets are problematic: A5.13: Table 43. Maximum number of filesets Version of GPFS Maximum Number of Dependent Filesets Maximum Number of Independent Filesets IBM Spectrum Scale V4 10,000 1,000 GPFS V3.5 10,000 1,000 Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. This is true of the root namespace as well, but there?s only one number to watch per filesystem. ? ddj Dave Johnson Brown University -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 18 15:36:33 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 18 May 2017 14:36:33 +0000 Subject: [gpfsug-discuss] Keytab error trying to join an active directory domain In-Reply-To: References: Message-ID: It's crappy, I had to put the command in 10+ times before it would work. Just keep at it (that's my takeaway, sorry I'm not that technical when it comes to Kerberos). Could be a domain replication thing. Is time syncing properly across all your CES nodes? Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Aidan Richmond Sent: 18 May 2017 15:23 To: gpfsug main discussion list Subject: [gpfsug-discuss] Keytab error trying to join an active directory domain Hello I'm trying to join an AD domain for SMB and NFS protocol sharing but I keep getting a "Failed to generate the kerberos keytab file" error. The command I'm running is /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type ad --netbios-name @name@ --servers @adserver@ --user-name @username@ --idmap-role master --enable-nfs-kerberos --unixmap-domains "DS(1000-9999999)" A correct keytab does appears to be created on the host I run this on (one of two protocol nodes) but not on the other one. -- Aidan Richmond Apple/Unix Support Officer, IT Garstang 10.137 Faculty of Biological Sciences University of Leeds Clarendon Way LS2 9JT Tel:0113 3434252 a.g.richmond at leeds.ac.uk _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ulmer at ulmer.org Thu May 18 15:47:59 2017 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 18 May 2017 10:47:59 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> Message-ID: <578DD835-3829-431D-9E89-A3DFDCDA8F39@ulmer.org> Each independent fileset is an allocation area, and they are (I believe) handled separately. There are a set of allocation managers for each file system, and when you need to create a file you ask one of them to do it. Each one has a pre-negotiated range of inodes to hand out, so there isn?t a single point of contention for creating files. I?m pretty sure that means that they all have to have a range for each inode space. This is based on my own logic, and could be complete nonsense. While I?m sure that limit could be changed eventually, there?s probably some efficiencies in not making it much bigger than it needs to be. I don?t know if it would take an on-disk format change or not. So how do you decide that a use case gets it?s own fileset, and do you just always use independent or is there an evaluation? I?m just curious because I like to understand lots of different points of view ? feel free to tell me to go away. :) -- Stephen > On May 18, 2017, at 10:32 AM, Sobey, Richard A > wrote: > > Thanks, I was just about to post that, and I guess is still the reason a dependent fileset is still the default without the ?inode-space new option fileset creation. > > I do wonder why there is a limit of 1000, whether it?s just IBM not envisaging any customer needing more than that? We?ve only got 414 at the moment but that will grow to over 500 this year. > > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org ]On Behalf Of David D. Johnson > Sent: 18 May 2017 15:24 > To: gpfsug main discussion list > > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors > > Here is one big reason independent filesets are problematic: > A5.13: > Table 43. Maximum number of filesets > Version of GPFS > Maximum Number of Dependent Filesets > Maximum Number of Independent Filesets > IBM Spectrum Scale V4 > 10,000 > 1,000 > GPFS V3.5 > 10,000 > 1,000 > Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. > If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. > This is true of the root namespace as well, but there?s only one number to watch per filesystem. > > ? ddj > Dave Johnson > Brown University > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 18 15:58:20 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 18 May 2017 14:58:20 +0000 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <578DD835-3829-431D-9E89-A3DFDCDA8F39@ulmer.org> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> <578DD835-3829-431D-9E89-A3DFDCDA8F39@ulmer.org> Message-ID: So it could be that we didn?t really know what we were doing when our system was installed (and still don?t by some of the messages I post *cough*) but basically I think we?re quite similar to other shops where we resell GPFS to departmental users internally and it just made some sense to break down each one into a fileset. We can then snapshot each one individually (7402 snapshots at the moment) and apply quotas. I know your question was why independent and not dependent ? but I honestly don?t know. I assume it?s to do with not crossing the streams if you?ll excuse the obvious film reference. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Stephen Ulmer Sent: 18 May 2017 15:48 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Each independent fileset is an allocation area, and they are (I believe) handled separately. There are a set of allocation managers for each file system, and when you need to create a file you ask one of them to do it. Each one has a pre-negotiated range of inodes to hand out, so there isn?t a single point of contention for creating files. I?m pretty sure that means that they all have to have a range for each inode space. This is based on my own logic, and could be complete nonsense. While I?m sure that limit could be changed eventually, there?s probably some efficiencies in not making it much bigger than it needs to be. I don?t know if it would take an on-disk format change or not. So how do you decide that a use case gets it?s own fileset, and do you just always use independent or is there an evaluation? I?m just curious because I like to understand lots of different points of view ? feel free to tell me to go away. :) -- Stephen On May 18, 2017, at 10:32 AM, Sobey, Richard A > wrote: Thanks, I was just about to post that, and I guess is still the reason a dependent fileset is still the default without the ?inode-space new option fileset creation. I do wonder why there is a limit of 1000, whether it?s just IBM not envisaging any customer needing more than that? We?ve only got 414 at the moment but that will grow to over 500 this year. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org]On Behalf Of David D. Johnson Sent: 18 May 2017 15:24 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Here is one big reason independent filesets are problematic: A5.13: Table 43. Maximum number of filesets Version of GPFS Maximum Number of Dependent Filesets Maximum Number of Independent Filesets IBM Spectrum Scale V4 10,000 1,000 GPFS V3.5 10,000 1,000 Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. This is true of the root namespace as well, but there?s only one number to watch per filesystem. ? ddj Dave Johnson Brown University _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Thu May 18 16:15:30 2017 From: chair at spectrumscale.org (Spectrum Scale UG Chair (Simon Thompson)) Date: Thu, 18 May 2017 16:15:30 +0100 Subject: [gpfsug-discuss] Save the date SSUG 2018 - April 18th/19th 2018 Message-ID: Hi All, A date for your diary, #SSUG18 in the UK will be taking place on: April 18th, 19th 2018 Please mark it in your diaries now :-) We'll confirm other details etc nearer the time, but date is confirmed. Simon From john.hearns at asml.com Thu May 18 16:23:29 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 18 May 2017 15:23:29 +0000 Subject: [gpfsug-discuss] Introduction Message-ID: Good afternoon all, my name is John Hearns. I am currently working with the HPC Team at ASML in the Netherlands, the market sector is manufacturing. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu May 18 17:36:46 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 18 May 2017 12:36:46 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> Message-ID: <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset .... > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such thing. > Filesets are kinda like little filesystems within filesystems. Moving a > file from one fileset to another requires a copy operation. There is no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > >> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always > think >> and try to read that as "inode space". >> >> An "independent fileset" has all the attributes of an (older-fashioned) >> dependent fileset PLUS all of its files are represented by inodes that > are >> in a separable range of inode numbers - this allows GPFS to efficiently > do >> snapshots of just that inode-space (uh... independent fileset)... >> >> And... of course the files of dependent filesets must also be > represented >> by inodes -- those inode numbers are within the inode-space of whatever >> the containing independent fileset is... as was chosen when you created >> the fileset.... If you didn't say otherwise, inodes come from the >> default "root" fileset.... >> >> Clear as your bath-water, no? >> >> So why does mmbackup care one way or another ??? Stay tuned.... >> >> BTW - if you look at the bits of the inode numbers carefully --- you may >> not immediately discern what I mean by a "separable range of inode >> numbers" -- (very technical hint) you may need to permute the bit order >> before you discern a simple pattern... >> >> >> >> From: "Luis Bolinches" >> To: gpfsug-discuss at spectrumscale.org >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 05/18/2017 02:10 AM >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope > errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi >> >> There is no direct way to convert the one fileset that is dependent to >> independent or viceversa. >> >> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >> definitions about GPFS ILM including filesets >> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >> place that is explained but I honestly believe is a good single start >> point. It also needs an update as does nto have anything on CES nor ESS, >> so anyone in this list feel free to give feedback on that page people > with >> funding decisions listen there. >> >> So you are limited to either migrate the data from that fileset to a new >> independent fileset (multiple ways to do that) or use the TSM client >> config. >> >> ----- Original message ----- >> From: "Jaime Pinto" >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: "gpfsug main discussion list" , >> "Jaime Pinto" >> Cc: >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >> Date: Thu, May 18, 2017 4:43 AM >> >> There is hope. See reference link below: >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > >> >> >> The issue has to do with dependent vs. independent filesets, something >> I didn't even realize existed until now. Our filesets are dependent >> (for no particular reason), so I have to find a way to turn them into >> independent. >> >> The proper option syntax is "--scope inodespace", and the error >> message actually flagged that out, however I didn't know how to >> interpret what I saw: >> >> >> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> -------------------------------------------------------- >> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >> 21:27:43 EDT 2017. >> -------------------------------------------------------- >> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >> fileset sysadmin3 is not supported >> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> -------------------------------------------------------- >> >> Will post the outcome. >> Jaime >> >> >> >> Quoting "Jaime Pinto" : >> >>> Quoting "Luis Bolinches" : >>> >>>> Hi >>>> >>>> have you tried to add exceptions on the TSM client config file? >>> >>> Hey Luis, >>> >>> That would work as well (mechanically), however it's not elegant or >>> efficient. When you have over 1PB and 200M files on scratch it will >>> take many hours and several helper nodes to traverse that fileset just >>> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >>> Considering that I want to keep project and sysadmin on different >>> domains then it's much worst, since we have to traverse and exclude >>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>> to capture project. >>> >>> If I have to use exclusion rules it has to rely sole on gpfs rules, and >>> somehow not traverse scratch at all. >>> >>> I suspect there is a way to do this properly, however the examples on >>> the gpfs guide and other references are not exhaustive. They only show >>> a couple of trivial cases. >>> >>> However my situation is not unique. I suspect there are may facilities >>> having to deal with backup of HUGE filesets. >>> >>> So the search is on. >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>>> >>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >> linked >>>> on /IBM/GPFS/FSET1 >>>> >>>> dsm.sys >>>> ... >>>> >>>> DOMAIN /IBM/GPFS >>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>> >>>> >>>> From: "Jaime Pinto" >>>> To: "gpfsug main discussion list" >> >>>> Date: 17-05-17 23:44 >>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>> * project3 >>>> * scratch3 >>>> * sysadmin3 >>>> >>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>> have no need or space to include *scratch3* on TSM. >>>> >>>> Question: how to craft the mmbackup command to backup >>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>> >>>> Below are 3 types of errors: >>>> >>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>> dependent fileset sysadmin3 is not supported >>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>> fileset level backup. exit 1 >>>> >>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> These examples don't really cover my case: >>>> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >> >>>> >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>>> >>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>> Oy IBM Finland Ab >>>> PL 265, 00101 Helsinki, Finland >>>> Business ID, Y-tunnus: 0195876-3 >>>> Registered in Finland >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From makaplan at us.ibm.com Thu May 18 18:05:59 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 May 2017 13:05:59 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> Message-ID: 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup wants to support incremental backups (using what it calls its shadow database) and keep both your sanity and its sanity -- so mmbackup limits you to either full filesystem or full inode-space (independent fileset.) If you want to do something else, okay, but you have to be careful and be sure of yourself. IBM will not be able to jump in and help you if and when it comes time to restore and you discover that your backup(s) were not complete. 2. If you decide you're a big boy (or woman or XXX) and want to do some hacking ... Fine... But even then, I suggest you do the smallest hack that will mostly achieve your goal... DO NOT think you can create a custom policy rules list for mmbackup out of thin air.... Capture the rules mmbackup creates and make small changes to that -- And as with any disaster recovery plan..... Plan your Test and Test your Plan.... Then do some dry run recoveries before you really "need" to do a real recovery. I only even sugest this because Jaime says he has a huge filesystem with several dependent filesets and he really, really wants to do a partial backup, without first copying or re-organizing the filesets. HMMM.... otoh... if you have one or more dependent filesets that are smallish, and/or you don't need the backups -- create independent filesets, copy/move/delete the data, rename, voila. From: "Jaime Pinto" To: "Marc A Kaplan" Cc: "gpfsug main discussion list" Date: 05/18/2017 12:36 PM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset .... > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such thing. > Filesets are kinda like little filesystems within filesystems. Moving a > file from one fileset to another requires a copy operation. There is no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > >> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always > think >> and try to read that as "inode space". >> >> An "independent fileset" has all the attributes of an (older-fashioned) >> dependent fileset PLUS all of its files are represented by inodes that > are >> in a separable range of inode numbers - this allows GPFS to efficiently > do >> snapshots of just that inode-space (uh... independent fileset)... >> >> And... of course the files of dependent filesets must also be > represented >> by inodes -- those inode numbers are within the inode-space of whatever >> the containing independent fileset is... as was chosen when you created >> the fileset.... If you didn't say otherwise, inodes come from the >> default "root" fileset.... >> >> Clear as your bath-water, no? >> >> So why does mmbackup care one way or another ??? Stay tuned.... >> >> BTW - if you look at the bits of the inode numbers carefully --- you may >> not immediately discern what I mean by a "separable range of inode >> numbers" -- (very technical hint) you may need to permute the bit order >> before you discern a simple pattern... >> >> >> >> From: "Luis Bolinches" >> To: gpfsug-discuss at spectrumscale.org >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 05/18/2017 02:10 AM >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope > errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi >> >> There is no direct way to convert the one fileset that is dependent to >> independent or viceversa. >> >> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >> definitions about GPFS ILM including filesets >> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >> place that is explained but I honestly believe is a good single start >> point. It also needs an update as does nto have anything on CES nor ESS, >> so anyone in this list feel free to give feedback on that page people > with >> funding decisions listen there. >> >> So you are limited to either migrate the data from that fileset to a new >> independent fileset (multiple ways to do that) or use the TSM client >> config. >> >> ----- Original message ----- >> From: "Jaime Pinto" >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: "gpfsug main discussion list" , >> "Jaime Pinto" >> Cc: >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >> Date: Thu, May 18, 2017 4:43 AM >> >> There is hope. See reference link below: >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > >> >> >> The issue has to do with dependent vs. independent filesets, something >> I didn't even realize existed until now. Our filesets are dependent >> (for no particular reason), so I have to find a way to turn them into >> independent. >> >> The proper option syntax is "--scope inodespace", and the error >> message actually flagged that out, however I didn't know how to >> interpret what I saw: >> >> >> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> -------------------------------------------------------- >> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >> 21:27:43 EDT 2017. >> -------------------------------------------------------- >> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >> fileset sysadmin3 is not supported >> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> -------------------------------------------------------- >> >> Will post the outcome. >> Jaime >> >> >> >> Quoting "Jaime Pinto" : >> >>> Quoting "Luis Bolinches" : >>> >>>> Hi >>>> >>>> have you tried to add exceptions on the TSM client config file? >>> >>> Hey Luis, >>> >>> That would work as well (mechanically), however it's not elegant or >>> efficient. When you have over 1PB and 200M files on scratch it will >>> take many hours and several helper nodes to traverse that fileset just >>> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >>> Considering that I want to keep project and sysadmin on different >>> domains then it's much worst, since we have to traverse and exclude >>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>> to capture project. >>> >>> If I have to use exclusion rules it has to rely sole on gpfs rules, and >>> somehow not traverse scratch at all. >>> >>> I suspect there is a way to do this properly, however the examples on >>> the gpfs guide and other references are not exhaustive. They only show >>> a couple of trivial cases. >>> >>> However my situation is not unique. I suspect there are may facilities >>> having to deal with backup of HUGE filesets. >>> >>> So the search is on. >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>>> >>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >> linked >>>> on /IBM/GPFS/FSET1 >>>> >>>> dsm.sys >>>> ... >>>> >>>> DOMAIN /IBM/GPFS >>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>> >>>> >>>> From: "Jaime Pinto" >>>> To: "gpfsug main discussion list" >> >>>> Date: 17-05-17 23:44 >>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>> * project3 >>>> * scratch3 >>>> * sysadmin3 >>>> >>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>> have no need or space to include *scratch3* on TSM. >>>> >>>> Question: how to craft the mmbackup command to backup >>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>> >>>> Below are 3 types of errors: >>>> >>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>> dependent fileset sysadmin3 is not supported >>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>> fileset level backup. exit 1 >>>> >>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> These examples don't really cover my case: >>>> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >> >>>> >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>>> >>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>> Oy IBM Finland Ab >>>> PL 265, 00101 Helsinki, Finland >>>> Business ID, Y-tunnus: 0195876-3 >>>> Registered in Finland >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu May 18 20:02:46 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 18 May 2017 15:02:46 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> Message-ID: <20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Ok Mark I'll follow your option 2) suggestion, and capture what mmbackup is using as a rule first, then modify it. I imagine by 'capture' you are referring to the -L n level I use? -L n Controls the level of information displayed by the mmbackup command. Larger values indicate the display of more detailed information. n should be one of the following values: 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. Thanks Jaime Quoting "Marc A Kaplan" : > 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup > wants to support incremental backups (using what it calls its shadow > database) and keep both your sanity and its sanity -- so mmbackup limits > you to either full filesystem or full inode-space (independent fileset.) > If you want to do something else, okay, but you have to be careful and be > sure of yourself. IBM will not be able to jump in and help you if and when > it comes time to restore and you discover that your backup(s) were not > complete. > > 2. If you decide you're a big boy (or woman or XXX) and want to do some > hacking ... Fine... But even then, I suggest you do the smallest hack > that will mostly achieve your goal... > DO NOT think you can create a custom policy rules list for mmbackup out of > thin air.... Capture the rules mmbackup creates and make small changes to > that -- > And as with any disaster recovery plan..... Plan your Test and Test your > Plan.... Then do some dry run recoveries before you really "need" to do a > real recovery. > > I only even sugest this because Jaime says he has a huge filesystem with > several dependent filesets and he really, really wants to do a partial > backup, without first copying or re-organizing the filesets. > > HMMM.... otoh... if you have one or more dependent filesets that are > smallish, and/or you don't need the backups -- create independent > filesets, copy/move/delete the data, rename, voila. > > > > From: "Jaime Pinto" > To: "Marc A Kaplan" > Cc: "gpfsug main discussion list" > Date: 05/18/2017 12:36 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Marc > > The -P option may be a very good workaround, but I still have to test it. > > I'm currently trying to craft the mm rule, as minimalist as possible, > however I'm not sure about what attributes mmbackup expects to see. > > Below is my first attempt. It would be nice to get comments from > somebody familiar with the inner works of mmbackup. > > Thanks > Jaime > > > /* A macro to abbreviate VARCHAR */ > define([vc],[VARCHAR($1)]) > > /* Define three external lists */ > RULE EXTERNAL LIST 'allfiles' EXEC > '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' > > /* Generate a list of all files, directories, plus all other file > system objects, > like symlinks, named pipes, etc. Include the owner's id with each > object and > sort them by the owner's id */ > > RULE 'r1' LIST 'allfiles' > DIRECTORIES_PLUS > SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || > vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) > FROM POOL 'system' > FOR FILESET('sysadmin3') > > /* Files in special filesets, such as those excluded, are never traversed > */ > RULE 'ExcSpecialFile' EXCLUDE > FOR FILESET('scratch3','project3') > > > > > > Quoting "Marc A Kaplan" : > >> Jaime, >> >> While we're waiting for the mmbackup expert to weigh in, notice that > the >> mmbackup command does have a -P option that allows you to provide a >> customized policy rules file. >> >> So... a fairly safe hack is to do a trial mmbackup run, capture the >> automatically generated policy file, and then augment it with FOR >> FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup > for >> real with your customized policy file. >> >> mmbackup uses mmapplypolicy which by itself is happy to limit its >> directory scan to a particular fileset by using >> >> mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope >> fileset .... >> >> However, mmbackup probably has other worries and for simpliciity and >> helping make sure you get complete, sensible backups, apparently has >> imposed some restrictions to preserve sanity (yours and our support > team! >> ;-) ) ... (For example, suppose you were doing incremental backups, >> starting at different paths each time? -- happy to do so, but when >> disaster strikes and you want to restore -- you'll end up confused > and/or >> unhappy!) >> >> "converting from one fileset to another" --- sorry there is no such > thing. >> Filesets are kinda like little filesystems within filesystems. Moving > a >> file from one fileset to another requires a copy operation. There is > no >> fast move nor hardlinking. >> >> --marc >> >> >> >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" > , >> "Marc A Kaplan" >> Date: 05/18/2017 09:58 AM >> Subject: Re: [gpfsug-discuss] What is an independent fileset? > was: >> mmbackup with fileset : scope errors >> >> >> >> Thanks for the explanation Mark and Luis, >> >> It begs the question: why filesets are created as dependent by >> default, if the adverse repercussions can be so great afterward? Even >> in my case, where I manage GPFS and TSM deployments (and I have been >> around for a while), didn't realize at all that not adding and extra >> option at fileset creation time would cause me huge trouble with >> scaling later on as I try to use mmbackup. >> >> When you have different groups to manage file systems and backups that >> don't read each-other's manuals ahead of time then we have a really >> bad recipe. >> >> I'm looking forward to your explanation as to why mmbackup cares one >> way or another. >> >> I'm also hoping for a hint as to how to configure backup exclusion >> rules on the TSM side to exclude fileset traversing on the GPFS side. >> Is mmbackup smart enough (actually smarter than TSM client itself) to >> read the exclusion rules on the TSM configuration and apply them >> before traversing? >> >> Thanks >> Jaime >> >> Quoting "Marc A Kaplan" : >> >>> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always >> think >>> and try to read that as "inode space". >>> >>> An "independent fileset" has all the attributes of an (older-fashioned) >>> dependent fileset PLUS all of its files are represented by inodes that >> are >>> in a separable range of inode numbers - this allows GPFS to efficiently >> do >>> snapshots of just that inode-space (uh... independent fileset)... >>> >>> And... of course the files of dependent filesets must also be >> represented >>> by inodes -- those inode numbers are within the inode-space of whatever >>> the containing independent fileset is... as was chosen when you created >>> the fileset.... If you didn't say otherwise, inodes come from the >>> default "root" fileset.... >>> >>> Clear as your bath-water, no? >>> >>> So why does mmbackup care one way or another ??? Stay tuned.... >>> >>> BTW - if you look at the bits of the inode numbers carefully --- you > may >>> not immediately discern what I mean by a "separable range of inode >>> numbers" -- (very technical hint) you may need to permute the bit order >>> before you discern a simple pattern... >>> >>> >>> >>> From: "Luis Bolinches" >>> To: gpfsug-discuss at spectrumscale.org >>> Cc: gpfsug-discuss at spectrumscale.org >>> Date: 05/18/2017 02:10 AM >>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope >> errors >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> Hi >>> >>> There is no direct way to convert the one fileset that is dependent to >>> independent or viceversa. >>> >>> I would suggest to take a look to chapter 5 of the 2014 redbook, lots > of >>> definitions about GPFS ILM including filesets >>> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the > only >>> place that is explained but I honestly believe is a good single start >>> point. It also needs an update as does nto have anything on CES nor > ESS, >>> so anyone in this list feel free to give feedback on that page people >> with >>> funding decisions listen there. >>> >>> So you are limited to either migrate the data from that fileset to a > new >>> independent fileset (multiple ways to do that) or use the TSM client >>> config. >>> >>> ----- Original message ----- >>> From: "Jaime Pinto" >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> To: "gpfsug main discussion list" , >>> "Jaime Pinto" >>> Cc: >>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Date: Thu, May 18, 2017 4:43 AM >>> >>> There is hope. See reference link below: >>> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > >> >>> >>> >>> The issue has to do with dependent vs. independent filesets, something >>> I didn't even realize existed until now. Our filesets are dependent >>> (for no particular reason), so I have to find a way to turn them into >>> independent. >>> >>> The proper option syntax is "--scope inodespace", and the error >>> message actually flagged that out, however I didn't know how to >>> interpret what I saw: >>> >>> >>> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope inodespace --tsm-errorlog $logfile -L 2 >>> -------------------------------------------------------- >>> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >>> 21:27:43 EDT 2017. >>> -------------------------------------------------------- >>> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >>> fileset sysadmin3 is not supported >>> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >>> fileset level backup. exit 1 >>> -------------------------------------------------------- >>> >>> Will post the outcome. >>> Jaime >>> >>> >>> >>> Quoting "Jaime Pinto" : >>> >>>> Quoting "Luis Bolinches" : >>>> >>>>> Hi >>>>> >>>>> have you tried to add exceptions on the TSM client config file? >>>> >>>> Hey Luis, >>>> >>>> That would work as well (mechanically), however it's not elegant or >>>> efficient. When you have over 1PB and 200M files on scratch it will >>>> take many hours and several helper nodes to traverse that fileset just >>>> to be negated by TSM. In fact exclusion on TSM are just as > inefficient. >>>> Considering that I want to keep project and sysadmin on different >>>> domains then it's much worst, since we have to traverse and exclude >>>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>>> to capture project. >>>> >>>> If I have to use exclusion rules it has to rely sole on gpfs rules, > and >>>> somehow not traverse scratch at all. >>>> >>>> I suspect there is a way to do this properly, however the examples on >>>> the gpfs guide and other references are not exhaustive. They only show >>>> a couple of trivial cases. >>>> >>>> However my situation is not unique. I suspect there are may facilities >>>> having to deal with backup of HUGE filesets. >>>> >>>> So the search is on. >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> >>>> >>>>> >>>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >>> linked >>>>> on /IBM/GPFS/FSET1 >>>>> >>>>> dsm.sys >>>>> ... >>>>> >>>>> DOMAIN /IBM/GPFS >>>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>>> >>>>> >>>>> From: "Jaime Pinto" >>>>> To: "gpfsug main discussion list" >>> >>>>> Date: 17-05-17 23:44 >>>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>>> >>>>> >>>>> >>>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>>> * project3 >>>>> * scratch3 >>>>> * sysadmin3 >>>>> >>>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>>> have no need or space to include *scratch3* on TSM. >>>>> >>>>> Question: how to craft the mmbackup command to backup >>>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>>> >>>>> Below are 3 types of errors: >>>>> >>>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>>> cannot be specified at the same time. >>>>> >>>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>>> dependent fileset sysadmin3 is not supported >>>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>>> fileset level backup. exit 1 >>>>> >>>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>>> cannot be specified at the same time. >>>>> >>>>> These examples don't really cover my case: >>>>> >>> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >> >>> >>>>> >>>>> >>>>> Thanks >>>>> Jaime >>>>> >>>>> >>>>> ************************************ >>>>> TELL US ABOUT YOUR SUCCESS STORIES >>>>> http://www.scinethpc.ca/testimonials >>>>> ************************************ >>>>> --- >>>>> Jaime Pinto >>>>> SciNet HPC Consortium - Compute/Calcul Canada >>>>> www.scinet.utoronto.ca - www.computecanada.ca >>>>> University of Toronto >>>>> 661 University Ave. (MaRS), Suite 1140 >>>>> Toronto, ON, M5G1M1 >>>>> P: 416-978-2755 >>>>> C: 416-505-1477 >>>>> >>>>> ---------------------------------------------------------------- >>>>> This message was sent using IMP at SciNet Consortium, University of >>>>> Toronto. >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>>> Oy IBM Finland Ab >>>>> PL 265, 00101 Helsinki, Finland >>>>> Business ID, Y-tunnus: 0195876-3 >>>>> Registered in Finland >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>> Oy IBM Finland Ab >>> PL 265, 00101 Helsinki, Finland >>> Business ID, Y-tunnus: 0195876-3 >>> Registered in Finland >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From jtucker at pixitmedia.com Thu May 18 20:32:54 2017 From: jtucker at pixitmedia.com (Jez Tucker) Date: Thu, 18 May 2017 20:32:54 +0100 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> <20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Message-ID: Hi When mmbackup has passed the preflight stage (pretty quickly) you'll find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* Best, Jez On 18/05/17 20:02, Jaime Pinto wrote: > Ok Mark > > I'll follow your option 2) suggestion, and capture what mmbackup is > using as a rule first, then modify it. > > I imagine by 'capture' you are referring to the -L n level I use? > > -L n > Controls the level of information displayed by the > mmbackup command. Larger values indicate the > display of more detailed information. n should be one of > the following values: > > 3 > Displays the same information as 2, plus each > candidate file and the applicable rule. > > 4 > Displays the same information as 3, plus each > explicitly EXCLUDEed or LISTed > file, and the applicable rule. > > 5 > Displays the same information as 4, plus the > attributes of candidate and EXCLUDEed or > LISTed files. > > 6 > Displays the same information as 5, plus > non-candidate files and their attributes. > > Thanks > Jaime > > > > > Quoting "Marc A Kaplan" : > >> 1. As I surmised, and I now have verification from Mr. mmbackup, >> mmbackup >> wants to support incremental backups (using what it calls its shadow >> database) and keep both your sanity and its sanity -- so mmbackup limits >> you to either full filesystem or full inode-space (independent fileset.) >> If you want to do something else, okay, but you have to be careful >> and be >> sure of yourself. IBM will not be able to jump in and help you if and >> when >> it comes time to restore and you discover that your backup(s) were not >> complete. >> >> 2. If you decide you're a big boy (or woman or XXX) and want to do some >> hacking ... Fine... But even then, I suggest you do the smallest hack >> that will mostly achieve your goal... >> DO NOT think you can create a custom policy rules list for mmbackup >> out of >> thin air.... Capture the rules mmbackup creates and make small >> changes to >> that -- >> And as with any disaster recovery plan..... Plan your Test and Test >> your >> Plan.... Then do some dry run recoveries before you really "need" to >> do a >> real recovery. >> >> I only even sugest this because Jaime says he has a huge filesystem with >> several dependent filesets and he really, really wants to do a partial >> backup, without first copying or re-organizing the filesets. >> >> HMMM.... otoh... if you have one or more dependent filesets that are >> smallish, and/or you don't need the backups -- create independent >> filesets, copy/move/delete the data, rename, voila. >> >> >> >> From: "Jaime Pinto" >> To: "Marc A Kaplan" >> Cc: "gpfsug main discussion list" >> Date: 05/18/2017 12:36 PM >> Subject: Re: [gpfsug-discuss] What is an independent fileset? >> was: >> mmbackup with fileset : scope errors >> >> >> >> Marc >> >> The -P option may be a very good workaround, but I still have to test >> it. >> >> I'm currently trying to craft the mm rule, as minimalist as possible, >> however I'm not sure about what attributes mmbackup expects to see. >> >> Below is my first attempt. It would be nice to get comments from >> somebody familiar with the inner works of mmbackup. >> >> Thanks >> Jaime >> >> >> /* A macro to abbreviate VARCHAR */ >> define([vc],[VARCHAR($1)]) >> >> /* Define three external lists */ >> RULE EXTERNAL LIST 'allfiles' EXEC >> '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' >> >> /* Generate a list of all files, directories, plus all other file >> system objects, >> like symlinks, named pipes, etc. Include the owner's id with each >> object and >> sort them by the owner's id */ >> >> RULE 'r1' LIST 'allfiles' >> DIRECTORIES_PLUS >> SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || >> vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) >> FROM POOL 'system' >> FOR FILESET('sysadmin3') >> >> /* Files in special filesets, such as those excluded, are never >> traversed >> */ >> RULE 'ExcSpecialFile' EXCLUDE >> FOR FILESET('scratch3','project3') >> >> >> >> >> >> Quoting "Marc A Kaplan" : >> >>> Jaime, >>> >>> While we're waiting for the mmbackup expert to weigh in, notice that >> the >>> mmbackup command does have a -P option that allows you to provide a >>> customized policy rules file. >>> >>> So... a fairly safe hack is to do a trial mmbackup run, capture the >>> automatically generated policy file, and then augment it with FOR >>> FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup >> for >>> real with your customized policy file. >>> >>> mmbackup uses mmapplypolicy which by itself is happy to limit its >>> directory scan to a particular fileset by using >>> >>> mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope >>> fileset .... >>> >>> However, mmbackup probably has other worries and for simpliciity and >>> helping make sure you get complete, sensible backups, apparently has >>> imposed some restrictions to preserve sanity (yours and our support >> team! >>> ;-) ) ... (For example, suppose you were doing incremental backups, >>> starting at different paths each time? -- happy to do so, but when >>> disaster strikes and you want to restore -- you'll end up confused >> and/or >>> unhappy!) >>> >>> "converting from one fileset to another" --- sorry there is no such >> thing. >>> Filesets are kinda like little filesystems within filesystems. Moving >> a >>> file from one fileset to another requires a copy operation. There is >> no >>> fast move nor hardlinking. >>> >>> --marc >>> >>> >>> >>> From: "Jaime Pinto" >>> To: "gpfsug main discussion list" >> , >>> "Marc A Kaplan" >>> Date: 05/18/2017 09:58 AM >>> Subject: Re: [gpfsug-discuss] What is an independent fileset? >> was: >>> mmbackup with fileset : scope errors >>> >>> >>> >>> Thanks for the explanation Mark and Luis, >>> >>> It begs the question: why filesets are created as dependent by >>> default, if the adverse repercussions can be so great afterward? Even >>> in my case, where I manage GPFS and TSM deployments (and I have been >>> around for a while), didn't realize at all that not adding and extra >>> option at fileset creation time would cause me huge trouble with >>> scaling later on as I try to use mmbackup. >>> >>> When you have different groups to manage file systems and backups that >>> don't read each-other's manuals ahead of time then we have a really >>> bad recipe. >>> >>> I'm looking forward to your explanation as to why mmbackup cares one >>> way or another. >>> >>> I'm also hoping for a hint as to how to configure backup exclusion >>> rules on the TSM side to exclude fileset traversing on the GPFS side. >>> Is mmbackup smart enough (actually smarter than TSM client itself) to >>> read the exclusion rules on the TSM configuration and apply them >>> before traversing? >>> >>> Thanks >>> Jaime >>> >>> Quoting "Marc A Kaplan" : >>> >>>> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always >>> think >>>> and try to read that as "inode space". >>>> >>>> An "independent fileset" has all the attributes of an >>>> (older-fashioned) >>>> dependent fileset PLUS all of its files are represented by inodes that >>> are >>>> in a separable range of inode numbers - this allows GPFS to >>>> efficiently >>> do >>>> snapshots of just that inode-space (uh... independent fileset)... >>>> >>>> And... of course the files of dependent filesets must also be >>> represented >>>> by inodes -- those inode numbers are within the inode-space of >>>> whatever >>>> the containing independent fileset is... as was chosen when you >>>> created >>>> the fileset.... If you didn't say otherwise, inodes come from the >>>> default "root" fileset.... >>>> >>>> Clear as your bath-water, no? >>>> >>>> So why does mmbackup care one way or another ??? Stay tuned.... >>>> >>>> BTW - if you look at the bits of the inode numbers carefully --- you >> may >>>> not immediately discern what I mean by a "separable range of inode >>>> numbers" -- (very technical hint) you may need to permute the bit >>>> order >>>> before you discern a simple pattern... >>>> >>>> >>>> >>>> From: "Luis Bolinches" >>>> To: gpfsug-discuss at spectrumscale.org >>>> Cc: gpfsug-discuss at spectrumscale.org >>>> Date: 05/18/2017 02:10 AM >>>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope >>> errors >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> Hi >>>> >>>> There is no direct way to convert the one fileset that is dependent to >>>> independent or viceversa. >>>> >>>> I would suggest to take a look to chapter 5 of the 2014 redbook, lots >> of >>>> definitions about GPFS ILM including filesets >>>> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the >> only >>>> place that is explained but I honestly believe is a good single start >>>> point. It also needs an update as does nto have anything on CES nor >> ESS, >>>> so anyone in this list feel free to give feedback on that page people >>> with >>>> funding decisions listen there. >>>> >>>> So you are limited to either migrate the data from that fileset to a >> new >>>> independent fileset (multiple ways to do that) or use the TSM client >>>> config. >>>> >>>> ----- Original message ----- >>>> From: "Jaime Pinto" >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> To: "gpfsug main discussion list" , >>>> "Jaime Pinto" >>>> Cc: >>>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>>> Date: Thu, May 18, 2017 4:43 AM >>>> >>>> There is hope. See reference link below: >>>> >>> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm >> >> >>> >>>> >>>> >>>> The issue has to do with dependent vs. independent filesets, something >>>> I didn't even realize existed until now. Our filesets are dependent >>>> (for no particular reason), so I have to find a way to turn them into >>>> independent. >>>> >>>> The proper option syntax is "--scope inodespace", and the error >>>> message actually flagged that out, however I didn't know how to >>>> interpret what I saw: >>>> >>>> >>>> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>> -------------------------------------------------------- >>>> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >>>> 21:27:43 EDT 2017. >>>> -------------------------------------------------------- >>>> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >>>> fileset sysadmin3 is not supported >>>> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >>>> fileset level backup. exit 1 >>>> -------------------------------------------------------- >>>> >>>> Will post the outcome. >>>> Jaime >>>> >>>> >>>> >>>> Quoting "Jaime Pinto" : >>>> >>>>> Quoting "Luis Bolinches" : >>>>> >>>>>> Hi >>>>>> >>>>>> have you tried to add exceptions on the TSM client config file? >>>>> >>>>> Hey Luis, >>>>> >>>>> That would work as well (mechanically), however it's not elegant or >>>>> efficient. When you have over 1PB and 200M files on scratch it will >>>>> take many hours and several helper nodes to traverse that fileset >>>>> just >>>>> to be negated by TSM. In fact exclusion on TSM are just as >> inefficient. >>>>> Considering that I want to keep project and sysadmin on different >>>>> domains then it's much worst, since we have to traverse and exclude >>>>> scratch & (project|sysadmin) twice, once to capture sysadmin and >>>>> again >>>>> to capture project. >>>>> >>>>> If I have to use exclusion rules it has to rely sole on gpfs rules, >> and >>>>> somehow not traverse scratch at all. >>>>> >>>>> I suspect there is a way to do this properly, however the examples on >>>>> the gpfs guide and other references are not exhaustive. They only >>>>> show >>>>> a couple of trivial cases. >>>>> >>>>> However my situation is not unique. I suspect there are may >>>>> facilities >>>>> having to deal with backup of HUGE filesets. >>>>> >>>>> So the search is on. >>>>> >>>>> Thanks >>>>> Jaime >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >>>> linked >>>>>> on /IBM/GPFS/FSET1 >>>>>> >>>>>> dsm.sys >>>>>> ... >>>>>> >>>>>> DOMAIN /IBM/GPFS >>>>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>>>> >>>>>> >>>>>> From: "Jaime Pinto" >>>>>> To: "gpfsug main discussion list" >>>> >>>>>> Date: 17-05-17 23:44 >>>>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope >>>>>> errors >>>>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>>>> >>>>>> >>>>>> >>>>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>>>> * project3 >>>>>> * scratch3 >>>>>> * sysadmin3 >>>>>> >>>>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>>>> have no need or space to include *scratch3* on TSM. >>>>>> >>>>>> Question: how to craft the mmbackup command to backup >>>>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>>>> >>>>>> Below are 3 types of errors: >>>>>> >>>>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>>> --tsm-errorlog $logfile -L 2 >>>>>> >>>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope >>>>>> filesystem >>>>>> cannot be specified at the same time. >>>>>> >>>>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>>>> >>>>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>>>> dependent fileset sysadmin3 is not supported >>>>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>>>> fileset level backup. exit 1 >>>>>> >>>>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>>>> >>>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope >>>>>> filesystem >>>>>> cannot be specified at the same time. >>>>>> >>>>>> These examples don't really cover my case: >>>>>> >>>> >>> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >> >> >>> >>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> Jaime >>>>>> >>>>>> >>>>>> ************************************ >>>>>> TELL US ABOUT YOUR SUCCESS STORIES >>>>>> http://www.scinethpc.ca/testimonials >>>>>> ************************************ >>>>>> --- >>>>>> Jaime Pinto >>>>>> SciNet HPC Consortium - Compute/Calcul Canada >>>>>> www.scinet.utoronto.ca - www.computecanada.ca >>>>>> University of Toronto >>>>>> 661 University Ave. (MaRS), Suite 1140 >>>>>> Toronto, ON, M5G1M1 >>>>>> P: 416-978-2755 >>>>>> C: 416-505-1477 >>>>>> >>>>>> ---------------------------------------------------------------- >>>>>> This message was sent using IMP at SciNet Consortium, University of >>>>>> Toronto. >>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss at spectrumscale.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>>>> Oy IBM Finland Ab >>>>>> PL 265, 00101 Helsinki, Finland >>>>>> Business ID, Y-tunnus: 0195876-3 >>>>>> Registered in Finland >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ************************************ >>>>> TELL US ABOUT YOUR SUCCESS STORIES >>>>> http://www.scinethpc.ca/testimonials >>>>> ************************************ >>>>> --- >>>>> Jaime Pinto >>>>> SciNet HPC Consortium - Compute/Calcul Canada >>>>> www.scinet.utoronto.ca - www.computecanada.ca >>>>> University of Toronto >>>>> 661 University Ave. (MaRS), Suite 1140 >>>>> Toronto, ON, M5G1M1 >>>>> P: 416-978-2755 >>>>> C: 416-505-1477 >>>>> >>>>> ---------------------------------------------------------------- >>>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>> Oy IBM Finland Ab >>>> PL 265, 00101 Helsinki, Finland >>>> Business ID, Y-tunnus: 0195876-3 >>>> Registered in Finland >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfadden at us.ibm.com Thu May 18 22:46:49 2017 From: sfadden at us.ibm.com (Scott Fadden) Date: Thu, 18 May 2017 21:46:49 +0000 Subject: [gpfsug-discuss] Introduction In-Reply-To: Message-ID: Welcome! On May 17, 2017, 4:27:15 AM, neil.wilson at metoffice.gov.uk wrote: From: neil.wilson at metoffice.gov.uk To: gpfsug-discuss at spectrumscale.org Cc: Date: May 17, 2017 4:27:15 AM Subject: [gpfsug-discuss] Introduction Hi All, I help to run a gpfs cluster at the Met Office, Exeter, UK. The cluster is running GPFS 4.2.2.2, it?s used with slurm for batch work - primarily for postprocessing weather and climate change model data generated from our HPC. We currently have 8 NSD nodes with approx 3PB of storage with 70+ client nodes. Kind Regards Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu May 18 22:55:34 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 18 May 2017 21:55:34 +0000 Subject: [gpfsug-discuss] RPM Packages Message-ID: Hi All, Normally we never use the install toolkit, but deploy GPFS from a config management tool. I see there are now RPMs such as gpfs.license.dm, are these actually required to be installed? Everything seems to work well without them, so just interested. Maybe the GUI uses them? Thanks Simon From makaplan at us.ibm.com Fri May 19 14:50:20 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 19 May 2017 09:50:20 -0400 Subject: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca><20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca><20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Message-ID: Easier than hacking mmbackup or writing/editing policy rules, mmbackup interprets your TSM INCLUDE/EXCLUDE configuration statements -- so that is a supported and recommended way of doing business... If that doesn't do it for your purposes... You're into some light hacking... So look inside the mmbackup and tsbackup33 scripts and you'll find some DEBUG variables that should allow for keeping work and temp files around ... including the generated policy rules. I'm calling this hacking "light", because I don't think you'll need to change the scripts, but just look around and see how you can use what's there to achieve your legitimate purposes. Even so, you will have crossed a line where IBM support is "informal" at best. From: Jez Tucker To: gpfsug-discuss at spectrumscale.org Date: 05/18/2017 03:33 PM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi When mmbackup has passed the preflight stage (pretty quickly) you'll find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* Best, Jez On 18/05/17 20:02, Jaime Pinto wrote: Ok Mark I'll follow your option 2) suggestion, and capture what mmbackup is using as a rule first, then modify it. I imagine by 'capture' you are referring to the -L n level I use? -L n Controls the level of information displayed by the mmbackup command. Larger values indicate the display of more detailed information. n should be one of the following values: 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. Thanks Jaime Quoting "Marc A Kaplan" : 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup wants to support incremental backups (using what it calls its shadow database) and keep both your sanity and its sanity -- so mmbackup limits you to either full filesystem or full inode-space (independent fileset.) If you want to do something else, okay, but you have to be careful and be sure of yourself. IBM will not be able to jump in and help you if and when it comes time to restore and you discover that your backup(s) were not complete. 2. If you decide you're a big boy (or woman or XXX) and want to do some hacking ... Fine... But even then, I suggest you do the smallest hack that will mostly achieve your goal... DO NOT think you can create a custom policy rules list for mmbackup out of thin air.... Capture the rules mmbackup creates and make small changes to that -- And as with any disaster recovery plan..... Plan your Test and Test your Plan.... Then do some dry run recoveries before you really "need" to do a real recovery. I only even sugest this because Jaime says he has a huge filesystem with several dependent filesets and he really, really wants to do a partial backup, without first copying or re-organizing the filesets. HMMM.... otoh... if you have one or more dependent filesets that are smallish, and/or you don't need the backups -- create independent filesets, copy/move/delete the data, rename, voila. From: "Jaime Pinto" To: "Marc A Kaplan" Cc: "gpfsug main discussion list" Date: 05/18/2017 12:36 PM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : Jaime, While we're waiting for the mmbackup expert to weigh in, notice that the mmbackup command does have a -P option that allows you to provide a customized policy rules file. So... a fairly safe hack is to do a trial mmbackup run, capture the automatically generated policy file, and then augment it with FOR FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup for real with your customized policy file. mmbackup uses mmapplypolicy which by itself is happy to limit its directory scan to a particular fileset by using mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope fileset .... However, mmbackup probably has other worries and for simpliciity and helping make sure you get complete, sensible backups, apparently has imposed some restrictions to preserve sanity (yours and our support team! ;-) ) ... (For example, suppose you were doing incremental backups, starting at different paths each time? -- happy to do so, but when disaster strikes and you want to restore -- you'll end up confused and/or unhappy!) "converting from one fileset to another" --- sorry there is no such thing. Filesets are kinda like little filesystems within filesystems. Moving a file from one fileset to another requires a copy operation. There is no fast move nor hardlinking. --marc From: "Jaime Pinto" To: "gpfsug main discussion list" , "Marc A Kaplan" Date: 05/18/2017 09:58 AM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset.... If you didn't say otherwise, inodes come from the default "root" fileset.... Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned.... BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers" -- (very technical hint) you may need to permute the bit order before you discern a simple pattern... From: "Luis Bolinches" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 05/18/2017 02:10 AM Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi There is no direct way to convert the one fileset that is dependent to independent or viceversa. I would suggest to take a look to chapter 5 of the 2014 redbook, lots of definitions about GPFS ILM including filesets http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only place that is explained but I honestly believe is a good single start point. It also needs an update as does nto have anything on CES nor ESS, so anyone in this list feel free to give feedback on that page people with funding decisions listen there. So you are limited to either migrate the data from that fileset to a new independent fileset (multiple ways to do that) or use the TSM client config. ----- Original message ----- From: "Jaime Pinto" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug main discussion list" , "Jaime Pinto" Cc: Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Date: Thu, May 18, 2017 4:43 AM There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 -------------------------------------------------------- mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. -------------------------------------------------------- Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 -------------------------------------------------------- Will post the outcome. Jaime Quoting "Jaime Pinto" : Quoting "Luis Bolinches" : Hi have you tried to add exceptions on the TSM client config file? Hey Luis, That would work as well (mechanically), however it's not elegant or efficient. When you have over 1PB and 200M files on scratch it will take many hours and several helper nodes to traverse that fileset just to be negated by TSM. In fact exclusion on TSM are just as inefficient. Considering that I want to keep project and sysadmin on different domains then it's much worst, since we have to traverse and exclude scratch & (project|sysadmin) twice, once to capture sysadmin and again to capture project. If I have to use exclusion rules it has to rely sole on gpfs rules, and somehow not traverse scratch at all. I suspect there is a way to do this properly, however the examples on the gpfs guide and other references are not exhaustive. They only show a couple of trivial cases. However my situation is not unique. I suspect there are may facilities having to deal with backup of HUGE filesets. So the search is on. Thanks Jaime Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked on /IBM/GPFS/FSET1 dsm.sys ... DOMAIN /IBM/GPFS EXCLUDE.DIR /IBM/GPFS/FSET1 From: "Jaime Pinto" To: "gpfsug main discussion list" Date: 17-05-17 23:44 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: * project3 * scratch3 * sysadmin3 I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we have no need or space to include *scratch3* on TSM. Question: how to craft the mmbackup command to backup /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? Below are 3 types of errors: 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up dependent fileset sysadmin3 is not supported Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope filesystem --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. These examples don't really cover my case: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples Thanks Jaime ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Jez Tucker Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Fri May 19 17:12:20 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Fri, 19 May 2017 16:12:20 +0000 Subject: [gpfsug-discuss] RPM Packages Message-ID: For what it?s worth, I have been running 4.2.3 DM for a few weeks in a test lab and didn?t have the gpfs.license.dm package installed and everything worked fine (GUI, CES, etc, etc). Here?s what rpm says about itself [root at node1 ~]# rpm -qpl gpfs.license.dm-4.2.3-0.x86_64.rpm /usr/lpp/mmfs /usr/lpp/mmfs/bin /usr/lpp/mmfs/properties/version/ibm.com_IBM_Spectrum_Scale_Data_Mgmt_Edition-4.2.3.swidtag This file seems to be some XML code with strings of numbers in it. Not sure what it does for you. Mark On 5/18/17, 4:55 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (IT Research Support)" wrote: Hi All, Normally we never use the install toolkit, but deploy GPFS from a config management tool. I see there are now RPMs such as gpfs.license.dm, are these actually required to be installed? Everything seems to work well without them, so just interested. Maybe the GUI uses them? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions From jonathon.anderson at colorado.edu Fri May 19 17:16:50 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Fri, 19 May 2017 16:16:50 +0000 Subject: [gpfsug-discuss] RPM Packages In-Reply-To: References: Message-ID: Data Management Edition optionally replaces the traditional GPFS licensing model with a per-terabyte licensing fee, rather than a per-socket licensing fee. https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS216-158 Presumably installing this RPM is how you tell GPFS which licensing model you?re using. ~jonathon On 5/19/17, 10:12 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mark Bush" wrote: For what it?s worth, I have been running 4.2.3 DM for a few weeks in a test lab and didn?t have the gpfs.license.dm package installed and everything worked fine (GUI, CES, etc, etc). Here?s what rpm says about itself [root at node1 ~]# rpm -qpl gpfs.license.dm-4.2.3-0.x86_64.rpm /usr/lpp/mmfs /usr/lpp/mmfs/bin /usr/lpp/mmfs/properties/version/ibm.com_IBM_Spectrum_Scale_Data_Mgmt_Edition-4.2.3.swidtag This file seems to be some XML code with strings of numbers in it. Not sure what it does for you. Mark On 5/18/17, 4:55 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (IT Research Support)" wrote: Hi All, Normally we never use the install toolkit, but deploy GPFS from a config management tool. I see there are now RPMs such as gpfs.license.dm, are these actually required to be installed? Everything seems to work well without them, so just interested. Maybe the GUI uses them? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Fri May 19 17:43:49 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Fri, 19 May 2017 16:43:49 +0000 Subject: [gpfsug-discuss] RPM Packages In-Reply-To: References: , Message-ID: Well, I installed it one node and it still claims that it's advanced licensed on the node (only after installing gpfs.adv of course). I know the license model for DME, but we've never installed the gpfs.license.standard packages before. I agree the XML string pro ably is used somewhere, just not clear if it's needed or not... My guess would be maybe the GUI uses it. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 19 May 2017 17:16 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RPM Packages Data Management Edition optionally replaces the traditional GPFS licensing model with a per-terabyte licensing fee, rather than a per-socket licensing fee. https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS216-158 Presumably installing this RPM is how you tell GPFS which licensing model you?re using. ~jonathon On 5/19/17, 10:12 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mark Bush" wrote: For what it?s worth, I have been running 4.2.3 DM for a few weeks in a test lab and didn?t have the gpfs.license.dm package installed and everything worked fine (GUI, CES, etc, etc). Here?s what rpm says about itself [root at node1 ~]# rpm -qpl gpfs.license.dm-4.2.3-0.x86_64.rpm /usr/lpp/mmfs /usr/lpp/mmfs/bin /usr/lpp/mmfs/properties/version/ibm.com_IBM_Spectrum_Scale_Data_Mgmt_Edition-4.2.3.swidtag This file seems to be some XML code with strings of numbers in it. Not sure what it does for you. Mark On 5/18/17, 4:55 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (IT Research Support)" wrote: Hi All, Normally we never use the install toolkit, but deploy GPFS from a config management tool. I see there are now RPMs such as gpfs.license.dm, are these actually required to be installed? Everything seems to work well without them, so just interested. Maybe the GUI uses them? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From tpathare at sidra.org Sun May 21 09:40:42 2017 From: tpathare at sidra.org (Tushar Pathare) Date: Sun, 21 May 2017 08:40:42 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue Message-ID: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sun May 21 09:59:38 2017 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Sun, 21 May 2017 08:59:38 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> Message-ID: Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that. Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log). -Aaron On May 21, 2017 at 04:41:00 EDT, Tushar Pathare wrote: Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpathare at sidra.org Sun May 21 10:18:11 2017 From: tpathare at sidra.org (Tushar Pathare) Date: Sun, 21 May 2017 09:18:11 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> Message-ID: <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: on behalf of "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" Reply-To: gpfsug main discussion list Date: Sunday, May 21, 2017 at 11:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that. Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log). -Aaron On May 21, 2017 at 04:41:00 EDT, Tushar Pathare wrote: Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpathare at sidra.org Sun May 21 10:19:23 2017 From: tpathare at sidra.org (Tushar Pathare) Date: Sun, 21 May 2017 09:19:23 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> Message-ID: <31425940-A928-455A-83DD-DE2F9F387AC6@sidra.org> Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 112.11.11.11 ( sidra.snode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 136 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: Tushar Pathare Date: Sunday, May 21, 2017 at 12:18 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: on behalf of "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" Reply-To: gpfsug main discussion list Date: Sunday, May 21, 2017 at 11:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that. Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log). -Aaron On May 21, 2017 at 04:41:00 EDT, Tushar Pathare wrote: Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Sun May 21 15:36:02 2017 From: oehmes at gmail.com (Sven Oehme) Date: Sun, 21 May 2017 14:36:02 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: <31425940-A928-455A-83DD-DE2F9F387AC6@sidra.org> References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> <31425940-A928-455A-83DD-DE2F9F387AC6@sidra.org> Message-ID: The reason is the default setting of : verbsRdmasPerConnection : 16 you can increase this , on smaller clusters i run on some with 1024, but its not advised to run this on 100's of nodes and not if you know exactly what you are doing. i would start by doubling it to 32 and see how much of the waiters disappear, then go to 64 if you still see too many. don't go beyond 128 unless somebody knowledgeable reviewed your config further going to 32 or 64 is very low risk if you already run with verbs send enabled and don't have issues. On Sun, May 21, 2017 at 2:19 AM Tushar Pathare wrote: > Hello Aaron, > > Yes we saw recently an issue with > > > > VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 > (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 > > And > > > > VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 112.11.11.11 ( > sidra.snode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 136 > > > > Thanks > > > > *Tushar B Pathare MBA IT,BE IT* > > Bigdata & GPFS > > Software Development & Databases > > Scientific Computing > > Bioinformatics Division > > Research > > > > "What ever the mind of man can conceive and believe, drill can query" > > > > *Sidra Medical and Research Centre* > > *Sidra OPC Building* > > Sidra Medical & Research Center > > PO Box 26999 > > Al Luqta Street > > Education City North Campus > > ?Qatar Foundation, Doha, Qatar > > Office 4003 3333 ext 37443 | M +974 74793547 <+974%207479%203547> > > tpathare at sidra.org | www.sidra.org > > > > > > *From: *Tushar Pathare > *Date: *Sunday, May 21, 2017 at 12:18 PM > > > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] VERBS RDMA issue > > > > Hello Aaron, > > Yes we saw recently an issue with > > > > VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 > (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 > > And > > > > > > > > > > *Tushar B Pathare MBA IT,BE IT* > > Bigdata & GPFS > > Software Development & Databases > > Scientific Computing > > Bioinformatics Division > > Research > > > > "What ever the mind of man can conceive and believe, drill can query" > > > > *Sidra Medical and Research Centre* > > *Sidra OPC Building* > > Sidra Medical & Research Center > > PO Box 26999 > > Al Luqta Street > > Education City North Campus > > ?Qatar Foundation, Doha, Qatar > > Office 4003 3333 ext 37443 | M +974 74793547 <+974%207479%203547> > > tpathare at sidra.org | www.sidra.org > > > > > > *From: * on behalf of "Knister, > Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > *Reply-To: *gpfsug main discussion list > *Date: *Sunday, May 21, 2017 at 11:59 AM > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] VERBS RDMA issue > > > > Hi Tushar, > > > > For me the issue was an underlying performance bottleneck (some CPU > frequency scaling problems causing cores to throttle back when it wasn't > appropriate). > > > > I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the > past to turn this off under certain conditions although I don't remember > what those where. Hopefully others can chime in and qualify that. > > > > > Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the > mmfs.log). > > > > > -Aaron > > > > > > On May 21, 2017 at 04:41:00 EDT, Tushar Pathare > wrote: > > Hello Team, > > > > We are facing a lot of messages waiters related to *waiting for conn > rdmas < conn maxrdmas > * > > > > Is there some recommended settings to resolve this issue.? > > Our config for RDMA is as follows for 140 nodes(32 cores each) > > > > > > VERBS RDMA Configuration: > > Status : started > > Start time : Thu > > Stats reset time : Thu > > Dump time : Sun > > mmfs verbsRdma : enable > > mmfs verbsRdmaCm : disable > > mmfs verbsPorts : mlx4_0/1 mlx4_0/2 > > mmfs verbsRdmasPerNode : 3200 > > mmfs verbsRdmasPerNode (max) : 3200 > > mmfs verbsRdmasPerNodeOptimize : yes > > mmfs verbsRdmasPerConnection : 16 > > mmfs verbsRdmasPerConnection (max) : 16 > > mmfs verbsRdmaMinBytes : 16384 > > mmfs verbsRdmaRoCEToS : -1 > > mmfs verbsRdmaQpRtrMinRnrTimer : 18 > > mmfs verbsRdmaQpRtrPathMtu : 2048 > > mmfs verbsRdmaQpRtrSl : 0 > > mmfs verbsRdmaQpRtrSlDynamic : no > > mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 > > mmfs verbsRdmaQpRtsRnrRetry : 6 > > mmfs verbsRdmaQpRtsRetryCnt : 6 > > mmfs verbsRdmaQpRtsTimeout : 18 > > mmfs verbsRdmaMaxSendBytes : 16777216 > > mmfs verbsRdmaMaxSendSge : 27 > > mmfs verbsRdmaSend : yes > > mmfs verbsRdmaSerializeRecv : no > > mmfs verbsRdmaSerializeSend : no > > mmfs verbsRdmaUseMultiCqThreads : yes > > mmfs verbsSendBufferMemoryMB : 1024 > > mmfs verbsLibName : libibverbs.so > > mmfs verbsRdmaCmLibName : librdmacm.so > > mmfs verbsRdmaMaxReconnectInterval : 60 > > mmfs verbsRdmaMaxReconnectRetries : -1 > > mmfs verbsRdmaReconnectAction : disable > > mmfs verbsRdmaReconnectThreads : 32 > > mmfs verbsHungRdmaTimeout : 90 > > ibv_fork_support : true > > Max connections : 196608 > > Max RDMA size : 16777216 > > Target number of vsend buffs : 16384 > > Initial vsend buffs per conn : 59 > > nQPs : 140 > > nCQs : 282 > > nCMIDs : 0 > > nDtoThreads : 2 > > nextIndex : 141 > > Number of Devices opened : 1 > > Device : mlx4_0 > > vendor_id : 713 > > Device vendor_part_id : 4099 > > Device mem register chunk : 8589934592 <(858)%20993-4592> > (0x200000000) > > Device max_sge : 32 > > Adjusted max_sge : 0 > > Adjusted max_sge vsend : 30 > > Device max_qp_wr : 16351 > > Device max_qp_rd_atom : 16 > > Open Connect Ports : 1 > > verbsConnectPorts[0] : mlx4_0/1/0 > > lid : 129 > > state : IBV_PORT_ACTIVE > > path_mtu : 2048 > > interface ID : 0xe41d2d030073b9d1 > > sendChannel.ib_channel : 0x7FA6CB816200 > > sendChannel.dtoThreadP : 0x7FA6CB821870 > > sendChannel.dtoThreadId : 12540 > > sendChannel.nFreeCq : 1 > > recvChannel.ib_channel : 0x7FA6CB81D590 > > recvChannel.dtoThreadP : 0x7FA6CB822BA0 > > recvChannel.dtoThreadId : 12541 > > recvChannel.nFreeCq : 1 > > ibv_cq : 0x7FA2724C81F8 > > ibv_cq.cqP : 0x0 > > ibv_cq.nEvents : 0 > > ibv_cq.contextP : 0x0 > > ibv_cq.ib_channel : 0x0 > > > > Thanks > > > > > > *Tushar B Pathare MBA IT,BE IT* > > Bigdata & GPFS > > Software Development & Databases > > Scientific Computing > > Bioinformatics Division > > Research > > > > "What ever the mind of man can conceive and believe, drill can query" > > > > *Sidra Medical and Research Centre* > > *Sidra OPC Building* > > Sidra Medical & Research Center > > PO Box 26999 > > Al Luqta Street > > Education City North Campus > > ?Qatar Foundation, Doha, Qatar > > Office 4003 3333 ext 37443 | M +974 74793547 <+974%207479%203547> > > tpathare at sidra.org | www.sidra.org > > > > Disclaimer: This email and its attachments may be confidential and are > intended solely for the use of the individual to whom it is addressed. If > you are not the intended recipient, any reading, printing, storage, > disclosure, copying or any other action taken in respect of this e-mail is > prohibited and may be unlawful. If you are not the intended recipient, > please notify the sender immediately by using the reply function and then > permanently delete what you have received. Any views or opinions expressed > are solely those of the author and do not necessarily represent those of > Sidra Medical and Research Center. > > Disclaimer: This email and its attachments may be confidential and are > intended solely for the use of the individual to whom it is addressed. If > you are not the intended recipient, any reading, printing, storage, > disclosure, copying or any other action taken in respect of this e-mail is > prohibited and may be unlawful. If you are not the intended recipient, > please notify the sender immediately by using the reply function and then > permanently delete what you have received. Any views or opinions expressed > are solely those of the author and do not necessarily represent those of > Sidra Medical and Research Center. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpathare at sidra.org Sun May 21 16:56:40 2017 From: tpathare at sidra.org (Tushar Pathare) Date: Sun, 21 May 2017 15:56:40 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> <31425940-A928-455A-83DD-DE2F9F387AC6@sidra.org> Message-ID: Thanks Sven. Will read more about it and discuss with the team to come to a conclusion Thank you for pointing out the param. Will let you know the results after the tuning. Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Sunday, May 21, 2017 at 5:36 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] VERBS RDMA issue The reason is the default setting of : verbsRdmasPerConnection : 16 you can increase this , on smaller clusters i run on some with 1024, but its not advised to run this on 100's of nodes and not if you know exactly what you are doing. i would start by doubling it to 32 and see how much of the waiters disappear, then go to 64 if you still see too many. don't go beyond 128 unless somebody knowledgeable reviewed your config further going to 32 or 64 is very low risk if you already run with verbs send enabled and don't have issues. On Sun, May 21, 2017 at 2:19 AM Tushar Pathare > wrote: Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 112.11.11.11 ( sidra.snode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 136 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: Tushar Pathare > Date: Sunday, May 21, 2017 at 12:18 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: > on behalf of "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > Reply-To: gpfsug main discussion list > Date: Sunday, May 21, 2017 at 11:59 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that. Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log). -Aaron On May 21, 2017 at 04:41:00 EDT, Tushar Pathare > wrote: Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed May 24 10:43:37 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Wed, 24 May 2017 09:43:37 +0000 Subject: [gpfsug-discuss] Report on Scale and Cloud Message-ID: Hi All, I forgot that I never circulated, as part of the RCUK Working Group on Cloud, we produced a report on using Scale with Cloud/Undercloud ... You can download the report from: https://cloud.ac.uk/reports/spectrumscale/ We had some input from various IBM people whilst writing, and bear in mind that its a snapshot of support at the point in time when it was written. Simon From kkr at lbl.gov Wed May 24 20:57:49 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 24 May 2017 12:57:49 -0700 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working Message-ID: Hello, We have been experimenting with Zimon and the SS GUI on our dev cluster under 4.2.3. Things work well with one collector, but I'm running into issues when trying to use symmetric collector peers, i.e. federation. hostA and hostB are setup as both collectors and sensors with each a collector peer for the other. When this is done I can use mmperfmon to query hostA from hostA or hostB and vice versa. However, with this federation setup, the GUI fails to show data. The GUI is running on hostB. >From the collector candidate pool, hostA has been selected (automatically, not manually) as can be seen in the sensor configuration file. The GUI is unable to load data (just shows "Loading" on the graph), *unless* I change the setting of the ZIMonAddress variable in /usr/lpp/mmfs/gui/conf/gpfsgui.properties from localhost to hostA explicitly, it does not work if I change it to hostB explicitly. The GUI also works fine if I remove the peer entries altogether and just have one collector. I thought that federation meant that no matter which collector was queried the data would be returned. This appears to work for mmperfmon, but not the GUI. Can anyone advise? I also don't like the idea of having a pool of collector candidates and hard-coding one into the GUI configuration. I am including some output below to show the configs and query results. Thanks, Kristy The peers are added into the ZIMonCollector.cfg using the default port 9085: peers = { host = "hostA" port = "9085" }, { host = "hostB" port = "9085" } And the nodes are added as collector candidates, on hostA and hostB you see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors. cfg: colCandidates = "hostA.nersc.gov ", " hostB.nersc.gov " colRedundancy = 1 collectors = { host = "hostA.nersc.gov " port = "4739" } Showing the config with mmperfmon config show: colCandidates = "hostA.nersc.gov ", " hostB.nersc.gov " colRedundancy = 1 collectors = { host = "" Using mmperfmon I can query either host. [root at hostA ~]# mmperfmon query cpu -N hostB Legend: 1: hostB.nersc.gov |CPU|cpu_system 2: hostB.nersc.gov |CPU|cpu_user 3: hostB.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:03:54 0.54 3.67 4961 2 2017-05-23-17:03:55 0.63 3.55 6199 3 2017-05-23-17:03:56 1.59 3.76 7914 4 2017-05-23-17:03:57 1.38 5.34 5393 5 2017-05-23-17:03:58 0.54 2.21 2435 6 2017-05-23-17:03:59 0.13 0.29 2519 7 2017-05-23-17:04:00 0.13 0.25 2197 8 2017-05-23-17:04:01 0.13 0.29 2473 9 2017-05-23-17:04:02 0.08 0.21 2336 10 2017-05-23-17:04:03 0.13 0.21 2312 [root@ hostB ~]# mmperfmon query cpu -N hostB Legend: 1: hostB.nersc.gov |CPU|cpu_system 2: hostB.nersc.gov |CPU|cpu_user 3: hostB.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:04:07 0.13 0.21 2010 2 2017-05-23-17:04:08 0.04 0.21 2571 3 2017-05-23-17:04:09 0.08 0.25 2766 4 2017-05-23-17:04:10 0.13 0.29 3147 5 2017-05-23-17:04:11 0.83 0.83 2596 6 2017-05-23-17:04:12 0.33 0.54 2530 7 2017-05-23-17:04:13 0.08 0.33 2428 8 2017-05-23-17:04:14 0.13 0.25 2326 9 2017-05-23-17:04:15 0.13 0.29 4190 10 2017-05-23-17:04:16 0.58 1.92 5882 [root@ hostB ~]# mmperfmon query cpu -N hostA Legend: 1: hostA.nersc.gov |CPU|cpu_system 2: hostA.nersc.gov |CPU|cpu_user 3: hostA.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:05:45 0.33 0.46 7460 2 2017-05-23-17:05:46 0.33 0.42 8993 3 2017-05-23-17:05:47 0.42 0.54 8709 4 2017-05-23-17:05:48 0.38 0.5 5923 5 2017-05-23-17:05:49 0.54 1.46 7381 6 2017-05-23-17:05:50 0.58 3.51 10381 7 2017-05-23-17:05:51 1.05 1.13 10995 8 2017-05-23-17:05:52 0.88 0.92 10855 9 2017-05-23-17:05:53 0.5 0.63 10958 10 2017-05-23-17:05:54 0.5 0.59 10285 [root@ hostA ~]# mmperfmon query cpu -N hostA Legend: 1: hostA.nersc.gov |CPU|cpu_system 2: hostA.nersc.gov |CPU|cpu_user 3: hostA.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:05:50 0.58 3.51 10381 2 2017-05-23-17:05:51 1.05 1.13 10995 3 2017-05-23-17:05:52 0.88 0.92 10855 4 2017-05-23-17:05:53 0.5 0.63 10958 5 2017-05-23-17:05:54 0.5 0.59 10285 6 2017-05-23-17:05:55 0.46 0.63 11621 7 2017-05-23-17:05:56 0.84 0.92 11477 8 2017-05-23-17:05:57 1.47 1.88 11084 9 2017-05-23-17:05:58 0.46 1.76 9125 10 2017-05-23-17:05:59 0.42 0.63 11745 -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylorm at us.ibm.com Thu May 25 14:46:06 2017 From: taylorm at us.ibm.com (Michael L Taylor) Date: Thu, 25 May 2017 06:46:06 -0700 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working In-Reply-To: References: Message-ID: Hi Kristy, At first glance your config looks ok. Here are a few things to check. Is 4.2.3 the first time you have installed and configured performance monitoring? Or have you configured it at some version < 4.2.3 and then upgraded to 4.2.3? Did you restart pmcollector after changing the configuration? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_guienableperfmon.htm "Configure peer configuration for the collectors. The collector configuration is stored in the /opt/IBM/zimon/ZIMonCollector.cfg file. This file defines collector peer configuration and the aggregation rules. If you are using only a single collector, you can skip this step. Restart the pmcollector service after making changes to the configuration file. The GUI must have access to all data from each GUI node. " Firewall ports are open for performance monitoring and MGMT GUI? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_firewallforgui.htm?cp=STXKQY https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_firewallforPMT.htm Did you setup the collectors with : prompt# mmperfmon config generate --collectors collector1.domain.com,collector2.domain.com,? Once the configuration file has been stored within IBM Spectrum Scale, it can be activated as follows. prompt# mmchnode --perfmon ?N nodeclass1,nodeclass2,? Perhaps once you make sure the federated mode is set between hostA and hostB as you like then 'systemctl restart pmcollector' and then 'systemctl restart gpfsgui' on both nodes? From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 05/24/2017 12:58 PM Subject: gpfsug-discuss Digest, Vol 64, Issue 61 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. SS Metrics (Zimon) and SS GUI, Federation not working (Kristy Kallback-Rose) ---------------------------------------------------------------------- Message: 1 Date: Wed, 24 May 2017 12:57:49 -0700 From: Kristy Kallback-Rose To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working Message-ID: Content-Type: text/plain; charset="utf-8" Hello, We have been experimenting with Zimon and the SS GUI on our dev cluster under 4.2.3. Things work well with one collector, but I'm running into issues when trying to use symmetric collector peers, i.e. federation. hostA and hostB are setup as both collectors and sensors with each a collector peer for the other. When this is done I can use mmperfmon to query hostA from hostA or hostB and vice versa. However, with this federation setup, the GUI fails to show data. The GUI is running on hostB. >From the collector candidate pool, hostA has been selected (automatically, not manually) as can be seen in the sensor configuration file. The GUI is unable to load data (just shows "Loading" on the graph), *unless* I change the setting of the ZIMonAddress variable in /usr/lpp/mmfs/gui/conf/gpfsgui.properties from localhost to hostA explicitly, it does not work if I change it to hostB explicitly. The GUI also works fine if I remove the peer entries altogether and just have one collector. I thought that federation meant that no matter which collector was queried the data would be returned. This appears to work for mmperfmon, but not the GUI. Can anyone advise? I also don't like the idea of having a pool of collector candidates and hard-coding one into the GUI configuration. I am including some output below to show the configs and query results. Thanks, Kristy The peers are added into the ZIMonCollector.cfg using the default port 9085: peers = { host = "hostA" port = "9085" }, { host = "hostB" port = "9085" } And the nodes are added as collector candidates, on hostA and hostB you see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors. cfg: colCandidates = "hostA.nersc.gov ", " hostB.nersc.gov " colRedundancy = 1 collectors = { host = "hostA.nersc.gov " port = "4739" } Showing the config with mmperfmon config show: colCandidates = "hostA.nersc.gov ", " hostB.nersc.gov " colRedundancy = 1 collectors = { host = "" Using mmperfmon I can query either host. [root at hostA ~]# mmperfmon query cpu -N hostB Legend: 1: hostB.nersc.gov |CPU|cpu_system 2: hostB.nersc.gov |CPU|cpu_user 3: hostB.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:03:54 0.54 3.67 4961 2 2017-05-23-17:03:55 0.63 3.55 6199 3 2017-05-23-17:03:56 1.59 3.76 7914 4 2017-05-23-17:03:57 1.38 5.34 5393 5 2017-05-23-17:03:58 0.54 2.21 2435 6 2017-05-23-17:03:59 0.13 0.29 2519 7 2017-05-23-17:04:00 0.13 0.25 2197 8 2017-05-23-17:04:01 0.13 0.29 2473 9 2017-05-23-17:04:02 0.08 0.21 2336 10 2017-05-23-17:04:03 0.13 0.21 2312 [root@ hostB ~]# mmperfmon query cpu -N hostB Legend: 1: hostB.nersc.gov |CPU|cpu_system 2: hostB.nersc.gov |CPU|cpu_user 3: hostB.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:04:07 0.13 0.21 2010 2 2017-05-23-17:04:08 0.04 0.21 2571 3 2017-05-23-17:04:09 0.08 0.25 2766 4 2017-05-23-17:04:10 0.13 0.29 3147 5 2017-05-23-17:04:11 0.83 0.83 2596 6 2017-05-23-17:04:12 0.33 0.54 2530 7 2017-05-23-17:04:13 0.08 0.33 2428 8 2017-05-23-17:04:14 0.13 0.25 2326 9 2017-05-23-17:04:15 0.13 0.29 4190 10 2017-05-23-17:04:16 0.58 1.92 5882 [root@ hostB ~]# mmperfmon query cpu -N hostA Legend: 1: hostA.nersc.gov |CPU|cpu_system 2: hostA.nersc.gov |CPU|cpu_user 3: hostA.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:05:45 0.33 0.46 7460 2 2017-05-23-17:05:46 0.33 0.42 8993 3 2017-05-23-17:05:47 0.42 0.54 8709 4 2017-05-23-17:05:48 0.38 0.5 5923 5 2017-05-23-17:05:49 0.54 1.46 7381 6 2017-05-23-17:05:50 0.58 3.51 10381 7 2017-05-23-17:05:51 1.05 1.13 10995 8 2017-05-23-17:05:52 0.88 0.92 10855 9 2017-05-23-17:05:53 0.5 0.63 10958 10 2017-05-23-17:05:54 0.5 0.59 10285 [root@ hostA ~]# mmperfmon query cpu -N hostA Legend: 1: hostA.nersc.gov |CPU|cpu_system 2: hostA.nersc.gov |CPU|cpu_user 3: hostA.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:05:50 0.58 3.51 10381 2 2017-05-23-17:05:51 1.05 1.13 10995 3 2017-05-23-17:05:52 0.88 0.92 10855 4 2017-05-23-17:05:53 0.5 0.63 10958 5 2017-05-23-17:05:54 0.5 0.59 10285 6 2017-05-23-17:05:55 0.46 0.63 11621 7 2017-05-23-17:05:56 0.84 0.92 11477 8 2017-05-23-17:05:57 1.47 1.88 11084 9 2017-05-23-17:05:58 0.46 1.76 9125 10 2017-05-23-17:05:59 0.42 0.63 11745 -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170524/e64509b9/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 64, Issue 61 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From NSCHULD at de.ibm.com Thu May 25 15:13:16 2017 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Thu, 25 May 2017 16:13:16 +0200 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working In-Reply-To: References: Message-ID: Hi, please upgrade to 4.2.3 ptf1 - the version before has an issue with federated queries in some situations. Mit freundlichen Gr??en / Kind regards Norbert Schuld From: Kristy Kallback-Rose To: gpfsug-discuss at spectrumscale.org Date: 24/05/2017 21:58 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, ? We have been experimenting with Zimon and the SS GUI on our dev cluster under 4.2.3. Things work well with one collector, but I'm running into issues when trying to use symmetric collector peers, i.e. federation. ? hostA and hostB are setup as both collectors and sensors with each a collector peer for the other. When this is done I can use mmperfmon to query hostA from hostA or hostB and vice versa. However, with this federation setup, the GUI fails to show data. The GUI is running on hostB. >From the collector candidate pool, hostA has been selected (automatically, not manually) as can be seen in the sensor configuration file. The GUI is unable to load data (just shows "Loading" on the graph), *unless* I change the setting of the?ZIMonAddress variable in?/usr/lpp/mmfs/gui/conf/gpfsgui.properties from localhost to hostA explicitly, it does not work if I change it to hostB explicitly. The GUI also works fine if I remove the peer entries altogether and just have one collector. ? I thought that federation meant that no matter which collector was queried the data would be returned. This appears to work for mmperfmon, but not the GUI. Can anyone advise? I also don't like the idea of having a pool of collector candidates and hard-coding one into the GUI configuration. I am including some output below to show the configs and query results. Thanks, Kristy ? The peers are added into the?ZIMonCollector.cfg using the default port 9085: ?peers = { ? ? ? ? host = "hostA" ? ? ? ? port = "9085" ?}, ?{ ? ? ? ? host = "hostB" ? ? ? ? port = "9085" ?} And the nodes are added as collector candidates, on hostA and hostB you see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors.cfg: colCandidates = "hostA.nersc.gov", "hostB.nersc.gov" colRedundancy = 1 collectors = { host = "hostA.nersc.gov" port = "4739" } Showing the config with mmperfmon config show: colCandidates = "hostA.nersc.gov", "hostB.nersc.gov" colRedundancy = 1 collectors = { host = "" Using mmperfmon I can query either host. [root at hostA ~]#? mmperfmon query cpu -N hostB Legend: ?1: hostB.nersc.gov|CPU|cpu_system ?2:?hostB.nersc.gov|CPU|cpu_user ?3:?hostB.nersc.gov|CPU|cpu_contexts Row ? ? ? ? ? Timestamp cpu_system cpu_user cpu_contexts ? 1 2017-05-23-17:03:54 ? ? ? 0.54 ? ? 3.67 ? ? ? ? 4961 ? 2 2017-05-23-17:03:55 ? ? ? 0.63 ? ? 3.55 ? ? ? ? 6199 ? 3 2017-05-23-17:03:56 ? ? ? 1.59 ? ? 3.76 ? ? ? ? 7914 ? 4 2017-05-23-17:03:57 ? ? ? 1.38 ? ? 5.34 ? ? ? ? 5393 ? 5 2017-05-23-17:03:58 ? ? ? 0.54 ? ? 2.21 ? ? ? ? 2435 ? 6 2017-05-23-17:03:59 ? ? ? 0.13 ? ? 0.29 ? ? ? ? 2519 ? 7 2017-05-23-17:04:00 ? ? ? 0.13 ? ? 0.25 ? ? ? ? 2197 ? 8 2017-05-23-17:04:01 ? ? ? 0.13 ? ? 0.29 ? ? ? ? 2473 ? 9 2017-05-23-17:04:02 ? ? ? 0.08 ? ? 0.21 ? ? ? ? 2336 ?10 2017-05-23-17:04:03 ? ? ? 0.13 ? ? 0.21 ? ? ? ? 2312 [root@?hostB?~]#? mmperfmon query cpu -N?hostB Legend: ?1:?hostB.nersc.gov|CPU|cpu_system ?2:?hostB.nersc.gov|CPU|cpu_user ?3:?hostB.nersc.gov|CPU|cpu_contexts Row ? ? ? ? ? Timestamp cpu_system cpu_user cpu_contexts ? 1 2017-05-23-17:04:07 ? ? ? 0.13 ? ? 0.21 ? ? ? ? 2010 ? 2 2017-05-23-17:04:08 ? ? ? 0.04 ? ? 0.21 ? ? ? ? 2571 ? 3 2017-05-23-17:04:09 ? ? ? 0.08 ? ? 0.25 ? ? ? ? 2766 ? 4 2017-05-23-17:04:10 ? ? ? 0.13 ? ? 0.29 ? ? ? ? 3147 ? 5 2017-05-23-17:04:11 ? ? ? 0.83 ? ? 0.83 ? ? ? ? 2596 ? 6 2017-05-23-17:04:12 ? ? ? 0.33 ? ? 0.54 ? ? ? ? 2530 ? 7 2017-05-23-17:04:13 ? ? ? 0.08 ? ? 0.33 ? ? ? ? 2428 ? 8 2017-05-23-17:04:14 ? ? ? 0.13 ? ? 0.25 ? ? ? ? 2326 ? 9 2017-05-23-17:04:15 ? ? ? 0.13 ? ? 0.29 ? ? ? ? 4190 ?10 2017-05-23-17:04:16 ? ? ? 0.58 ? ? 1.92 ? ? ? ? 5882 [root@?hostB?~]#? mmperfmon query cpu -N?hostA Legend: ?1:?hostA.nersc.gov|CPU|cpu_system ?2:?hostA.nersc.gov|CPU|cpu_user ?3:?hostA.nersc.gov|CPU|cpu_contexts Row ? ? ? ? ? Timestamp cpu_system cpu_user cpu_contexts ? 1 2017-05-23-17:05:45 ? ? ? 0.33 ? ? 0.46 ? ? ? ? 7460 ? 2 2017-05-23-17:05:46 ? ? ? 0.33 ? ? 0.42 ? ? ? ? 8993 ? 3 2017-05-23-17:05:47 ? ? ? 0.42 ? ? 0.54 ? ? ? ? 8709 ? 4 2017-05-23-17:05:48 ? ? ? 0.38? ? ? 0.5 ? ? ? ? 5923 ? 5 2017-05-23-17:05:49 ? ? ? 0.54 ? ? 1.46 ? ? ? ? 7381 ? 6 2017-05-23-17:05:50 ? ? ? 0.58 ? ? 3.51? ? ? ? 10381 ? 7 2017-05-23-17:05:51 ? ? ? 1.05 ? ? 1.13? ? ? ? 10995 ? 8 2017-05-23-17:05:52 ? ? ? 0.88 ? ? 0.92? ? ? ? 10855 ? 9 2017-05-23-17:05:53? ? ? ? 0.5 ? ? 0.63? ? ? ? 10958 ?10 2017-05-23-17:05:54? ? ? ? 0.5 ? ? 0.59? ? ? ? 10285 [root@?hostA?~]#? mmperfmon query cpu -N?hostA Legend: ?1:?hostA.nersc.gov|CPU|cpu_system ?2:?hostA.nersc.gov|CPU|cpu_user ?3:?hostA.nersc.gov|CPU|cpu_contexts Row ? ? ? ? ? Timestamp cpu_system cpu_user cpu_contexts ? 1 2017-05-23-17:05:50 ? ? ? 0.58 ? ? 3.51? ? ? ? 10381 ? 2 2017-05-23-17:05:51 ? ? ? 1.05 ? ? 1.13? ? ? ? 10995 ? 3 2017-05-23-17:05:52 ? ? ? 0.88 ? ? 0.92? ? ? ? 10855 ? 4 2017-05-23-17:05:53? ? ? ? 0.5 ? ? 0.63? ? ? ? 10958 ? 5 2017-05-23-17:05:54? ? ? ? 0.5 ? ? 0.59? ? ? ? 10285 ? 6 2017-05-23-17:05:55 ? ? ? 0.46 ? ? 0.63? ? ? ? 11621 ? 7 2017-05-23-17:05:56 ? ? ? 0.84 ? ? 0.92? ? ? ? 11477 ? 8 2017-05-23-17:05:57 ? ? ? 1.47 ? ? 1.88? ? ? ? 11084 ? 9 2017-05-23-17:05:58 ? ? ? 0.46 ? ? 1.76 ? ? ? ? 9125 ?10 2017-05-23-17:05:59 ? ? ? 0.42 ? ? 0.63? ? ? ? 11745 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kkr at lbl.gov Thu May 25 22:51:32 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 25 May 2017 14:51:32 -0700 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working In-Reply-To: References: Message-ID: Hi Michael, Norbert, Thanks for your replies, we did do all the setup as Michael described, and stop and restart services more than once ;-). I believe the issue is resolved with the PTF. I am still checking, but it seems to be working with symmetric peering between those two nodes. I will test further and expand to other nodes and make sure it continue to work. I will report back if I run into any other issues. Cheers, Kristy On Thu, May 25, 2017 at 6:46 AM, Michael L Taylor wrote: > Hi Kristy, > At first glance your config looks ok. Here are a few things to check. > > Is 4.2.3 the first time you have installed and configured performance > monitoring? Or have you configured it at some version < 4.2.3 and then > upgraded to 4.2.3? > > > Did you restart pmcollector after changing the configuration? > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/ > com.ibm.spectrum.scale.v4r23.doc/bl1adv_guienableperfmon.htm > "Configure peer configuration for the collectors. The collector > configuration is stored in the /opt/IBM/zimon/ZIMonCollector.cfg file. > This file defines collector peer configuration and the aggregation rules. > If you are using only a single collector, you can skip this step. Restart > the pmcollector service after making changes to the configuration file. The > GUI must have access to all data from each GUI node. " > > Firewall ports are open for performance monitoring and MGMT GUI? > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/ > com.ibm.spectrum.scale.v4r23.doc/bl1adv_firewallforgui.htm?cp=STXKQY > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/ > com.ibm.spectrum.scale.v4r23.doc/bl1adv_firewallforPMT.htm > > Did you setup the collectors with : > prompt# mmperfmon config generate --collectors collector1.domain.com, > collector2.domain.com,? > > Once the configuration file has been stored within IBM Spectrum Scale, it > can be activated as follows. > prompt# mmchnode --perfmon ?N nodeclass1,nodeclass2,? > > Perhaps once you make sure the federated mode is set between hostA and > hostB as you like then 'systemctl restart pmcollector' and then 'systemctl > restart gpfsgui' on both nodes? > > > > [image: Inactive hide details for gpfsug-discuss-request---05/24/2017 > 12:58:21 PM---Send gpfsug-discuss mailing list submissions to gp] > gpfsug-discuss-request---05/24/2017 12:58:21 PM---Send gpfsug-discuss > mailing list submissions to gpfsug-discuss at spectrumscale.org > > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 05/24/2017 12:58 PM > Subject: gpfsug-discuss Digest, Vol 64, Issue 61 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. SS Metrics (Zimon) and SS GUI, Federation not working > (Kristy Kallback-Rose) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 24 May 2017 12:57:49 -0700 > From: Kristy Kallback-Rose > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation > not working > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > > Hello, > > We have been experimenting with Zimon and the SS GUI on our dev cluster > under 4.2.3. Things work well with one collector, but I'm running into > issues when trying to use symmetric collector peers, i.e. federation. > > hostA and hostB are setup as both collectors and sensors with each a > collector peer for the other. When this is done I can use mmperfmon to > query hostA from hostA or hostB and vice versa. However, with this > federation setup, the GUI fails to show data. The GUI is running on hostB. > >From the collector candidate pool, hostA has been selected (automatically, > not manually) as can be seen in the sensor configuration file. The GUI is > unable to load data (just shows "Loading" on the graph), *unless* I change > the setting of the ZIMonAddress variable in > /usr/lpp/mmfs/gui/conf/gpfsgui.properties > from localhost to hostA explicitly, it does not work if I change it to > hostB explicitly. The GUI also works fine if I remove the peer entries > altogether and just have one collector. > > I thought that federation meant that no matter which collector was > queried the data would be returned. This appears to work for mmperfmon, but > not the GUI. Can anyone advise? I also don't like the idea of having a pool > of collector candidates and hard-coding one into the GUI configuration. I > am including some output below to show the configs and query results. > > Thanks, > > Kristy > > > The peers are added into the ZIMonCollector.cfg using the default port > 9085: > > peers = { > > host = "hostA" > > port = "9085" > > }, > > { > > host = "hostB" > > port = "9085" > > } > > > And the nodes are added as collector candidates, on hostA and hostB you > see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors. > cfg: > > colCandidates = "hostA.nersc.gov ", " > hostB.nersc.gov " > > colRedundancy = 1 > > collectors = { > > host = "hostA.nersc.gov " > > port = "4739" > > } > > > Showing the config with mmperfmon config show: > > colCandidates = "hostA.nersc.gov ", " > hostB.nersc.gov " > > colRedundancy = 1 > > collectors = { > > host = "" > > > Using mmperfmon I can query either host. > > > [root at hostA ~]# mmperfmon query cpu -N hostB > > > Legend: > > 1: hostB.nersc.gov |CPU|cpu_system > > 2: hostB.nersc.gov |CPU|cpu_user > > 3: hostB.nersc.gov |CPU|cpu_contexts > > > > Row Timestamp cpu_system cpu_user cpu_contexts > > 1 2017-05-23-17:03:54 0.54 3.67 4961 > > 2 2017-05-23-17:03:55 0.63 3.55 6199 > > 3 2017-05-23-17:03:56 1.59 3.76 7914 > > 4 2017-05-23-17:03:57 1.38 5.34 5393 > > 5 2017-05-23-17:03:58 0.54 2.21 2435 > > 6 2017-05-23-17:03:59 0.13 0.29 2519 > > 7 2017-05-23-17:04:00 0.13 0.25 2197 > > 8 2017-05-23-17:04:01 0.13 0.29 2473 > > 9 2017-05-23-17:04:02 0.08 0.21 2336 > > 10 2017-05-23-17:04:03 0.13 0.21 2312 > > > [root@ hostB ~]# mmperfmon query cpu -N hostB > > > Legend: > > 1: hostB.nersc.gov |CPU|cpu_system > > 2: hostB.nersc.gov |CPU|cpu_user > > 3: hostB.nersc.gov |CPU|cpu_contexts > > > > Row Timestamp cpu_system cpu_user cpu_contexts > > 1 2017-05-23-17:04:07 0.13 0.21 2010 > > 2 2017-05-23-17:04:08 0.04 0.21 2571 > > 3 2017-05-23-17:04:09 0.08 0.25 2766 > > 4 2017-05-23-17:04:10 0.13 0.29 3147 > > 5 2017-05-23-17:04:11 0.83 0.83 2596 > > 6 2017-05-23-17:04:12 0.33 0.54 2530 > > 7 2017-05-23-17:04:13 0.08 0.33 2428 > > 8 2017-05-23-17:04:14 0.13 0.25 2326 > > 9 2017-05-23-17:04:15 0.13 0.29 4190 > > 10 2017-05-23-17:04:16 0.58 1.92 5882 > > > [root@ hostB ~]# mmperfmon query cpu -N hostA > > > Legend: > > 1: hostA.nersc.gov |CPU|cpu_system > > 2: hostA.nersc.gov |CPU|cpu_user > > 3: hostA.nersc.gov |CPU|cpu_contexts > > > > Row Timestamp cpu_system cpu_user cpu_contexts > > 1 2017-05-23-17:05:45 0.33 0.46 7460 > > 2 2017-05-23-17:05:46 0.33 0.42 8993 > > 3 2017-05-23-17:05:47 0.42 0.54 8709 > > 4 2017-05-23-17:05:48 0.38 0.5 5923 > > 5 2017-05-23-17:05:49 0.54 1.46 7381 > > 6 2017-05-23-17:05:50 0.58 3.51 10381 > > 7 2017-05-23-17:05:51 1.05 1.13 10995 > > 8 2017-05-23-17:05:52 0.88 0.92 10855 > > 9 2017-05-23-17:05:53 0.5 0.63 10958 > > 10 2017-05-23-17:05:54 0.5 0.59 10285 > > > [root@ hostA ~]# mmperfmon query cpu -N hostA > > > Legend: > > 1: hostA.nersc.gov |CPU|cpu_system > > 2: hostA.nersc.gov |CPU|cpu_user > > 3: hostA.nersc.gov |CPU|cpu_contexts > > > > Row Timestamp cpu_system cpu_user cpu_contexts > > 1 2017-05-23-17:05:50 0.58 3.51 10381 > > 2 2017-05-23-17:05:51 1.05 1.13 10995 > > 3 2017-05-23-17:05:52 0.88 0.92 10855 > > 4 2017-05-23-17:05:53 0.5 0.63 10958 > > 5 2017-05-23-17:05:54 0.5 0.59 10285 > > 6 2017-05-23-17:05:55 0.46 0.63 11621 > > 7 2017-05-23-17:05:56 0.84 0.92 11477 > > 8 2017-05-23-17:05:57 1.47 1.88 11084 > > 9 2017-05-23-17:05:58 0.46 1.76 9125 > > 10 2017-05-23-17:05:59 0.42 0.63 11745 > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: 20170524/e64509b9/attachment.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 64, Issue 61 > ********************************************** > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon May 29 21:01:38 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 29 May 2017 16:01:38 -0400 Subject: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca><20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca><20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Message-ID: <20170529160138.18847jpj5x9kz8ki@support.scinet.utoronto.ca> Quoting "Marc A Kaplan" : > Easier than hacking mmbackup or writing/editing policy rules, > > mmbackup interprets > your TSM INCLUDE/EXCLUDE configuration statements -- so that is a > supported and recommended way of doing business... Finally got some time to resume testing on this Here is the syntax used (In this test I want to backup /wosgpfs/backmeup only) mmbackup /wosgpfs -N wos-gateway02-ib0 -s /dev/shm --tsm-errorlog $logfile -L 4 As far as I can tell, the EXCLUDE statements in the TSM configuration (dsm.opt) are being *ignored*. I tried a couple of formats: 1) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup" INCLExcl "/sysadmin/BA/ba-wos/bin/inclexcl" 1a) cat /sysadmin/BA/ba-wos/bin/inclexcl exclude.dir /wosgpfs/ignore /wosgpfs/junk /wosgpfs/project 1b) cat /sysadmin/BA/ba-wos/bin/inclexcl exclude.dir /wosgpfs/ignore exclude.dir /wosgpfs/junk exclude.dir /wosgpfs/project 2) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup" exclude.dir /wosgpfs/ignore exclude.dir /wosgpfs/junk exclude.dir /wosgpfs/project 3) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup -/wosgpfs/ignore -/wosgpfs/junk -/wosgpfs/project" In another words, all the contents under /wosgpfs are being traversed and going to the TSM backup. Furthermore, even with "-L 4" mmbackup is not logging the list of files being sent to the TSM backup anywhere on the client side. I only get that information from the TSM server side (get filespace). I know that all contents of /wosgpfs are being traversed because I have a tail on /wosgpfs/.mmbackupShadow.1.TAPENODE3.filesys.update > > If that doesn't do it for your purposes... You're into some light > hacking... So look inside the mmbackup and tsbackup33 scripts and you'll > find some DEBUG variables that should allow for keeping work and temp > files around ... including the generated policy rules. > I'm calling this hacking "light", because I don't think you'll need to > change the scripts, but just look around and see how you can use what's > there to achieve your legitimate purposes. Even so, you will have crossed > a line where IBM support is "informal" at best. On the other hand I am having better luck with the customer rules file. The modified template below will traverse only the /wosgpfs/backmeup, as intended, and only backup files modified under that path. I guess I have a working solution that I will try at scale now. [root at wos-gateway02 bin]# cat dsm.opt SERVERNAME TAPENODE3 ARCHSYMLINKASFILE NO DOMAIN "/wosgpfs/backmeup" __________________________________________________________ /* Auto-generated GPFS policy rules file * Generated on Wed May 24 12:12:51 2017 */ /* Server rules for backup server 1 *** TAPENODE3 *** */ RULE EXTERNAL LIST 'mmbackup.1.TAPENODE3' EXEC '/wosgpfs/.mmbackupCfg/BAexecScript.wosgpfs' OPTS '"/wosgpfs/.mmbackupShadow.1.TAPENODE3.filesys.update" "-servername=TAPENODE3" "-auditlogname=/wosgpfs/mmbackup.audit.wosgpfs.TAPENODE3" "NONE"' RULE 'BackupRule' LIST 'mmbackup.1.TAPENODE3' DIRECTORIES_PLUS SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' ELSE 'resdnt' END )) WHERE ( NOT ( (PATH_NAME LIKE '/%/.mmbackup%') OR (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.mmLockDir/%') OR (PATH_NAME LIKE '/%/.g2w/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/ignore/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/junk/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/project/%') OR /* DO NOT TRAVERSE OR BACKUP */ (MODE LIKE 's%') ) ) AND (PATH_NAME LIKE '/%/backmeup/%') /* TRAVERSE AND BACKUP */ AND (MISC_ATTRIBUTES LIKE '%u%') AND ( NOT ( (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.SpaceMan/%') ) ) AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) _________________________________________________________ [root at wos-gateway02 bin]# time ./mmbackup-wos.sh -------------------------------------------------------- mmbackup: Backup of /wosgpfs begins at Mon May 29 15:54:47 EDT 2017. -------------------------------------------------------- Mon May 29 15:54:49 2017 mmbackup:using user supplied policy rules: /sysadmin/BA/ba-wos/bin/mmbackupRules.wosgpfs Mon May 29 15:54:49 2017 mmbackup:Scanning file system wosgpfs Mon May 29 15:54:52 2017 mmbackup:Determining file system changes for wosgpfs [TAPENODE3]. Mon May 29 15:54:52 2017 mmbackup:changed=3, expired=0, unsupported=0 for server [TAPENODE3] Mon May 29 15:54:52 2017 mmbackup:Sending files to the TSM server [3 changed, 0 expired]. mmbackup: TSM Summary Information: Total number of objects inspected: 3 Total number of objects backed up: 3 Total number of objects updated: 0 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 0 Total number of objects failed: 0 Total number of objects encrypted: 0 Total number of bytes inspected: 4096 Total number of bytes transferred: 512 ---------------------------------------------------------- mmbackup: Backup of /wosgpfs completed successfully at Mon May 29 15:54:56 EDT 2017. ---------------------------------------------------------- real 0m9.276s user 0m2.906s sys 0m3.212s _________________________________________________________ Thanks for all the help Jaime > > > > > From: Jez Tucker > To: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 03:33 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > When mmbackup has passed the preflight stage (pretty quickly) you'll > find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* > > Best, > > Jez > > > On 18/05/17 20:02, Jaime Pinto wrote: > Ok Mark > > I'll follow your option 2) suggestion, and capture what mmbackup is using > as a rule first, then modify it. > > I imagine by 'capture' you are referring to the -L n level I use? > > -L n > Controls the level of information displayed by the > mmbackup command. Larger values indicate the > display of more detailed information. n should be one of > the following values: > > 3 > Displays the same information as 2, plus each > candidate file and the applicable rule. > > 4 > Displays the same information as 3, plus each > explicitly EXCLUDEed or LISTed > file, and the applicable rule. > > 5 > Displays the same information as 4, plus the > attributes of candidate and EXCLUDEed or > LISTed files. > > 6 > Displays the same information as 5, plus > non-candidate files and their attributes. > > Thanks > Jaime > > > > > Quoting "Marc A Kaplan" : > > 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup > wants to support incremental backups (using what it calls its shadow > database) and keep both your sanity and its sanity -- so mmbackup limits > you to either full filesystem or full inode-space (independent fileset.) > If you want to do something else, okay, but you have to be careful and be > sure of yourself. IBM will not be able to jump in and help you if and when > > it comes time to restore and you discover that your backup(s) were not > complete. > > 2. If you decide you're a big boy (or woman or XXX) and want to do some > hacking ... Fine... But even then, I suggest you do the smallest hack > that will mostly achieve your goal... > DO NOT think you can create a custom policy rules list for mmbackup out of > > thin air.... Capture the rules mmbackup creates and make small changes to > > that -- > And as with any disaster recovery plan..... Plan your Test and Test your > > Plan.... Then do some dry run recoveries before you really "need" to do a > > real recovery. > > I only even sugest this because Jaime says he has a huge filesystem with > several dependent filesets and he really, really wants to do a partial > backup, without first copying or re-organizing the filesets. > > HMMM.... otoh... if you have one or more dependent filesets that are > smallish, and/or you don't need the backups -- create independent > filesets, copy/move/delete the data, rename, voila. > > > > From: "Jaime Pinto" > To: "Marc A Kaplan" > Cc: "gpfsug main discussion list" > Date: 05/18/2017 12:36 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Marc > > The -P option may be a very good workaround, but I still have to test it. > > I'm currently trying to craft the mm rule, as minimalist as possible, > however I'm not sure about what attributes mmbackup expects to see. > > Below is my first attempt. It would be nice to get comments from > somebody familiar with the inner works of mmbackup. > > Thanks > Jaime > > > /* A macro to abbreviate VARCHAR */ > define([vc],[VARCHAR($1)]) > > /* Define three external lists */ > RULE EXTERNAL LIST 'allfiles' EXEC > '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' > > /* Generate a list of all files, directories, plus all other file > system objects, > like symlinks, named pipes, etc. Include the owner's id with each > object and > sort them by the owner's id */ > > RULE 'r1' LIST 'allfiles' > DIRECTORIES_PLUS > SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || > vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) > FROM POOL 'system' > FOR FILESET('sysadmin3') > > /* Files in special filesets, such as those excluded, are never traversed > */ > RULE 'ExcSpecialFile' EXCLUDE > FOR FILESET('scratch3','project3') > > > > > > Quoting "Marc A Kaplan" : > > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that > the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup > for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset .... > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support > team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused > and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such > thing. > Filesets are kinda like little filesystems within filesystems. Moving > a > file from one fileset to another requires a copy operation. There is > no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject: Re: [gpfsug-discuss] What is an independent fileset? > was: > mmbackup with fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always > think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that > are > in a separable range of inode numbers - this allows GPFS to efficiently > do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be > represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset.... If you didn't say otherwise, inodes come from the > default "root" fileset.... > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned.... > > BTW - if you look at the bits of the inode numbers carefully --- you > may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope > errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots > of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the > only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor > ESS, > so anyone in this list feel free to give feedback on that page people > with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a > new > independent fileset (multiple ways to do that) or use the TSM client > config. > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Jaime Pinto" > Cc: > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Date: Thu, May 18, 2017 4:43 AM > > There is hope. See reference link below: > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > > > > > > The issue has to do with dependent vs. independent filesets, something > I didn't even realize existed until now. Our filesets are dependent > (for no particular reason), so I have to find a way to turn them into > independent. > > The proper option syntax is "--scope inodespace", and the error > message actually flagged that out, however I didn't know how to > interpret what I saw: > > > # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > -------------------------------------------------------- > mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 > 21:27:43 EDT 2017. > -------------------------------------------------------- > Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* > fileset sysadmin3 is not supported > Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > -------------------------------------------------------- > > Will post the outcome. > Jaime > > > > Quoting "Jaime Pinto" : > > Quoting "Luis Bolinches" : > > Hi > > have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as > inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, > and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > > > Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is > linked > on /IBM/GPFS/FSET1 > > dsm.sys > ... > > DOMAIN /IBM/GPFS > EXCLUDE.DIR /IBM/GPFS/FSET1 > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > > Date: 17-05-17 23:44 > Subject: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: > * project3 > * scratch3 > * sysadmin3 > > I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we > have no need or space to include *scratch3* on TSM. > > Question: how to craft the mmbackup command to backup > /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? > > Below are 3 types of errors: > > 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > > ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up > dependent fileset sysadmin3 is not supported > Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > > 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope filesystem --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > These examples don't really cover my case: > > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > > > > > > > Thanks > Jaime > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Jez Tucker > Head of Research and Development, Pixit Media > 07764193820 | jtucker at pixitmedia.com > www.pixitmedia.com | Tw:@pixitmedia.com > > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email._______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Tomasz.Wolski at ts.fujitsu.com Mon May 29 21:23:12 2017 From: Tomasz.Wolski at ts.fujitsu.com (Tomasz.Wolski at ts.fujitsu.com) Date: Mon, 29 May 2017 20:23:12 +0000 Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 Message-ID: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> Hello, We are planning to integrate new IBM Spectrum Scale version 4.2.3 into our software, but in our current software release we have version 4.1.1 integrated. We are worried how would node-at-a-time updates look like when our customer wanted to update his cluster from 4.1.1 to 4.2.3 version. According to "Concepts, Planning and Installation Guide" (for 4.2.3), there's a limited compatibility between two GPFS versions and if they're not adjacent, then following update path is advised: "If you want to migrate to an IBM Spectrum Scale version that is not an adjacent release of your current version (for example, version 4.1.1.x to version 4.2.3), you can follow this migration path depending on your current version: V 3.5 > V 4.1.0.x > V 4.1.1.x > V 4.2.0.x > V 4.2.1.x > V 4.2.2.x > V 4.2.3.x" My question is: is the above statement true even though on nodes where new GPFS 4.2.3 is installed these nodes will not be migrated to latest release with "mmchconfig release=LATEST" until all nodes in the cluster will have been updated to version 4.2.3? In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? Best regards, Tomasz Wolski With best regards / Mit freundlichen Gr??en / Pozdrawiam Tomasz Wolski Development Engineer NDC Eternus CS HE (ET2) [cid:image002.gif at 01CE62B9.8ACFA960] FUJITSU Fujitsu Technology Solutions Sp. z o.o. Textorial Park Bldg C, ul. Fabryczna 17 90-344 Lodz, Poland -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2774 bytes Desc: image001.gif URL: From knop at us.ibm.com Tue May 30 03:54:04 2017 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 29 May 2017 22:54:04 -0400 Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 In-Reply-To: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> References: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> Message-ID: Tomasz, The statement below from "Concepts, Planning and Installation Guide" was found to be incorrect and is being withdrawn from the publications. The team is currently working on improvements to the guidance being provided for migration. For a cluster which is not running protocols like NFS/SMB/Object, migration of nodes one-at-a-time from 4.1.1 to 4.2.3 should work. Once all nodes are migrated to 4.2.3, command mmchconfig release=LATEST can be issued to move the cluster to the 4.2.3 level. Note that the command above will not change the file system level. The file system can be moved to the latest level with command mmchfs file-system-name -V full In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? That is expected to work. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 05/29/2017 04:24 PM Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, We are planning to integrate new IBM Spectrum Scale version 4.2.3 into our software, but in our current software release we have version 4.1.1 integrated. We are worried how would node-at-a-time updates look like when our customer wanted to update his cluster from 4.1.1 to 4.2.3 version. According to ?Concepts, Planning and Installation Guide? (for 4.2.3), there?s a limited compatibility between two GPFS versions and if they?re not adjacent, then following update path is advised: ?If you want to migrate to an IBM Spectrum Scale version that is not an adjacent release of your current version (for example, version 4.1.1.x to version 4.2.3), you can follow this migration path depending on your current version: V 3.5 > V 4.1.0.x > V 4.1.1.x > V 4.2.0.x > V 4.2.1.x > V 4.2.2.x > V 4.2.3.x? My question is: is the above statement true even though on nodes where new GPFS 4.2.3 is installed these nodes will not be migrated to latest release with ?mmchconfig release=LATEST? until all nodes in the cluster will have been updated to version 4.2.3? In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? Best regards, Tomasz Wolski With best regards / Mit freundlichen Gr??en / Pozdrawiam Tomasz Wolski Development Engineer NDC Eternus CS HE (ET2) FUJITSU Fujitsu Technology Solutions Sp. z o.o. Textorial Park Bldg C, ul. Fabryczna 17 90-344 Lodz, Poland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2774 bytes Desc: not available URL: From Achim.Rehor at de.ibm.com Tue May 30 08:42:23 2017 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Tue, 30 May 2017 09:42:23 +0200 Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 In-Reply-To: References: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2774 bytes Desc: not available URL: From andreas.petzold at kit.edu Tue May 30 13:16:40 2017 From: andreas.petzold at kit.edu (Andreas Petzold (SCC)) Date: Tue, 30 May 2017 14:16:40 +0200 Subject: [gpfsug-discuss] Associating I/O operations with files/processes Message-ID: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> Dear group, first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 Since a few days we see this on the problematic node: 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? Thanks, Andreas -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Andreas Petzold Hermann-von-Helmholtz-Platz 1, Building 449, Room 202 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 24916 Fax: +49 721 608 24972 Email: petzold at kit.edu www.scc.kit.edu KIT ? The Research University in the Helmholtz Association Since 2010, KIT has been certified as a family-friendly university. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5323 bytes Desc: S/MIME Cryptographic Signature URL: From john.hearns at asml.com Tue May 30 13:28:17 2017 From: john.hearns at asml.com (John Hearns) Date: Tue, 30 May 2017 12:28:17 +0000 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> Message-ID: Andreas, This is a stupid reply, but please bear with me. Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS filesystem) setup. We also had a new application which did post-processing One of the users reported that a post-processing job would take about 30 minutes. However when two or more of the same application were running the job would take several hours. We finally found that this slowdown was due to the IO size, the application was using the default size. We only found this by stracing the application and spending hours staring at the trace... I am sure there are better tools for this, and I do hope you don?t have to strace every application.... really. A good tool to get a general feel for IO pattersn is 'iotop'. It might help? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold (SCC) Sent: Tuesday, May 30, 2017 2:17 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Associating I/O operations with files/processes Dear group, first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 Since a few days we see this on the problematic node: 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? Thanks, Andreas -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Andreas Petzold Hermann-von-Helmholtz-Platz 1, Building 449, Room 202 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 24916 Fax: +49 721 608 24972 Email: petzold at kit.edu https://emea01.safelinks.protection.outlook.com/?url=www.scc.kit.edu&data=01%7C01%7Cjohn.hearns%40asml.com%7Cd3f8f819bf21408c419e08d4a755bde9%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=IwCAFwU6OI38yZK9cnmAcWpWD%2BlujeYDpgXuvvAdvVg%3D&reserved=0 KIT ? The Research University in the Helmholtz Association Since 2010, KIT has been certified as a family-friendly university. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From andreas.petzold at kit.edu Tue May 30 14:12:52 2017 From: andreas.petzold at kit.edu (Andreas Petzold (SCC)) Date: Tue, 30 May 2017 15:12:52 +0200 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> Message-ID: <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> Hi John, iotop wasn't helpful. It seems to be overwhelmed by what is going on on the machine. Cheers, Andreas On 05/30/2017 02:28 PM, John Hearns wrote: > Andreas, > This is a stupid reply, but please bear with me. > Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS filesystem) setup. > We also had a new application which did post-processing One of the users reported that a post-processing job would take about 30 minutes. > However when two or more of the same application were running the job would take several hours. > > We finally found that this slowdown was due to the IO size, the application was using the default size. > We only found this by stracing the application and spending hours staring at the trace... > > I am sure there are better tools for this, and I do hope you don?t have to strace every application.... really. > A good tool to get a general feel for IO pattersn is 'iotop'. It might help? > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold (SCC) > Sent: Tuesday, May 30, 2017 2:17 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] Associating I/O operations with files/processes > > Dear group, > > first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. > > Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): > > 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 > 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 > 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 > 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 > 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 > > Since a few days we see this on the problematic node: > > 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 > 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 > 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 > 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 > > The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. > > We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? > > Thanks, > > Andreas > > -- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5323 bytes Desc: S/MIME Cryptographic Signature URL: From aaron.s.knister at nasa.gov Tue May 30 14:47:52 2017 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Tue, 30 May 2017 13:47:52 +0000 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> , <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> Message-ID: <89666459-A01A-4B1D-BEDF-F742E8E888A9@nasa.gov> Hi Andreas, I often start with an lsof to see who has files open on the troubled filesystem and then start stracing the various processes to see which is responsible. It ought to be a process blocked in uninterruptible sleep and ideally would be obvious but on a shared machine it might not be. Something else you could do is a reverse lookup of the disk addresseses in iohist using mmfileid. This won't help if these are transient files but it could point you in the right direction. Careful though it'll give your metadata disks a tickle :) the syntax would be "mmfileid $FsName -d :$DiskAddrrss" where $DiskAddress is the 4th field from the iohist". It's not a quick command to return-- it could easily take up to a half hour but it should tell you which file path contains that disk address. Sometimes this is all too tedious and in that case grabbing some trace data can help. When you're experiencing I/O trouble you can run "mmtrace trace=def start" on the node, wait about a minute or so and then run "mmtrace stop". The resulting trcrpt file is bit of a monster to go through but I do believe you can see which PIDs are responsible for the I/O given some sleuthing. If it comes to that let me know and I'll see if I can point you at some phrases to grep for. It's been a while since I've done it. -Aaron On May 30, 2017 at 09:13:09 EDT, Andreas Petzold (SCC) wrote: Hi John, iotop wasn't helpful. It seems to be overwhelmed by what is going on on the machine. Cheers, Andreas On 05/30/2017 02:28 PM, John Hearns wrote: > Andreas, > This is a stupid reply, but please bear with me. > Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS filesystem) setup. > We also had a new application which did post-processing One of the users reported that a post-processing job would take about 30 minutes. > However when two or more of the same application were running the job would take several hours. > > We finally found that this slowdown was due to the IO size, the application was using the default size. > We only found this by stracing the application and spending hours staring at the trace... > > I am sure there are better tools for this, and I do hope you don?t have to strace every application.... really. > A good tool to get a general feel for IO pattersn is 'iotop'. It might help? > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold (SCC) > Sent: Tuesday, May 30, 2017 2:17 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] Associating I/O operations with files/processes > > Dear group, > > first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. > > Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): > > 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 > 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 > 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 > 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 > 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 > > Since a few days we see this on the problematic node: > > 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 > 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 > 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 > 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 > > The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. > > We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? > > Thanks, > > Andreas > > -- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Tue May 30 14:55:30 2017 From: oehmes at gmail.com (Sven Oehme) Date: Tue, 30 May 2017 13:55:30 +0000 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> Message-ID: Hi, the very first thing to do would be to do a mmfsadm dump iohist instead of mmdiag --iohist one time (we actually add this info in next release to mmdiag --iohist) to see if the thread type will reveal something : 07:25:53.578522 W data 1:20260249600 8192 35.930 488076 181 C0A70D0A:59076980 cli 192.167.20.129 Prefetch WritebehindWorkerThread 07:25:53.632722 W data 1:20260257792 8192 45.179 627136 173 C0A70D0A:59076980 cli 192.167.20.129 Cleaner CleanBufferThread 07:25:53.662067 W data 2:20259815424 8192 45.612 992975086 40 C0A70D0A:59076985 cli 192.167.20.130 Prefetch WritebehindWorkerThread 07:25:53.734274 W data 1:19601858560 8 0.624 50237 0 C0A70D0A:59076980 cli 192.167.20.129 MBHandler *DioHandlerThread* if you see DioHandlerThread most likely somebody changed a openflag to use O_DIRECT . if you don't use that flag even the app does only 4k i/o which is inefficient GPFS will detect this and do prefetch writebehind in large blocks, as soon as you add O_DIRECT, we don't do this anymore to honor the hint and then every single request gets handled one by one. after that the next thing would be to run a very low level trace with just IO infos like : mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N . this will start collection on the node you execute the command if you want to run it against a different node replace the dot at the end with the hostname . wait a few seconds and run mmtracectl --off you will get a message that the trace gets formated and a eventually a trace file . now grep for FIO and you get lines like this : 7.293293470 127182 TRACE_IO: FIO: write data tag 1670183 1 ioVecSize 64 1st buf 0x5C024940000 nsdId C0A71482:5872D94A da 2:51070828544 nSectors 32768 err 0 if you further reduce it to nSectors 8 you would focus only on your 4k writes you mentioned above. the key item in the line above you care about is tag 16... this is the inode number of your file. if you now do : cd /usr/lpp/mmfs/samples/util ; make then run (replace -i and filesystem path obviously) [root at fire01 util]# ./tsfindinode -i 1670183 /ibm/fs2-16m-09/ and you get a hit like this : 1670183 0 /ibm/fs2-16m-09//shared/test-newbuf you now know the file that is being accessed in the I/O example above is /ibm/fs2-16m-09//shared/test-newbuf hope that helps. sven On Tue, May 30, 2017 at 6:12 AM Andreas Petzold (SCC) < andreas.petzold at kit.edu> wrote: > Hi John, > > iotop wasn't helpful. It seems to be overwhelmed by what is going on on > the machine. > > Cheers, > > Andreas > > On 05/30/2017 02:28 PM, John Hearns wrote: > > Andreas, > > This is a stupid reply, but please bear with me. > > Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS > filesystem) setup. > > We also had a new application which did post-processing One of the users > reported that a post-processing job would take about 30 minutes. > > However when two or more of the same application were running the job > would take several hours. > > > > We finally found that this slowdown was due to the IO size, the > application was using the default size. > > We only found this by stracing the application and spending hours > staring at the trace... > > > > I am sure there are better tools for this, and I do hope you don?t have > to strace every application.... really. > > A good tool to get a general feel for IO pattersn is 'iotop'. It might > help? > > > > > > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold > (SCC) > > Sent: Tuesday, May 30, 2017 2:17 PM > > To: gpfsug-discuss at spectrumscale.org > > Subject: [gpfsug-discuss] Associating I/O operations with files/processes > > > > Dear group, > > > > first a quick introduction: at KIT we are running a 20+PB storage system > with several large (1-9PB) file systems. We have a 14 node NSD server > cluster and 5 small (~10 nodes) protocol node clusters which each mount one > of the file systems. The protocol nodes run server software (dCache, > xrootd) specific to our users which primarily are the LHC experiments at > CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, > while the protocol nodes communicate via Ethernet to their clients. > > > > Now let me describe the problem we are facing. Since a few days, one of > the protocol nodes shows a very strange and as of yet unexplained I/O > behaviour. Before we were usually seeing reads like this (iohist example > from a well behaved node): > > > > 14:03:37.637526 R data 32:138835918848 8192 46.626 cli > 0A417D79:58E3B179 172.18.224.19 > > 14:03:37.660177 R data 18:12590325760 8192 25.498 cli > 0A4179AD:58E3AE66 172.18.224.14 > > 14:03:37.640660 R data 15:106365067264 8192 45.682 cli > 0A4179AD:58E3ADD7 172.18.224.14 > > 14:03:37.657006 R data 35:130482421760 8192 30.872 cli > 0A417DAD:58E3B266 172.18.224.21 > > 14:03:37.643908 R data 33:107847139328 8192 45.571 cli > 0A417DAD:58E3B206 172.18.224.21 > > > > Since a few days we see this on the problematic node: > > > > 14:06:27.253537 R data 46:126258287872 8 15.474 cli > 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.268626 R data 40:137280768624 8 0.395 cli > 0A4179AD:58E3ADE3 172.18.224.14 > > 14:06:27.269056 R data 46:56452781528 8 0.427 cli > 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.269417 R data 47:97273159640 8 0.293 cli > 0A4179AD:58E3AE5A 172.18.224.14 > > 14:06:27.269293 R data 49:59102786168 8 0.425 cli > 0A4179AD:58E3AE72 172.18.224.14 > > 14:06:27.269531 R data 46:142387326944 8 0.340 cli > 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.269377 R data 28:102988517096 8 0.554 cli > 0A417879:58E3AD08 172.18.224.10 > > > > The number of read ops has gone up by O(1000) which is what one would > expect when going from 8192 sector reads to 8 sector reads. > > > > We have already excluded problems of node itself so we are focusing on > the applications running on the node. What we'd like to to is to associate > the I/O requests either with files or specific processes running on the > machine in order to be able to blame the correct application. Can somebody > tell us, if this is possible and if now, if there are other ways to > understand what application is causing this? > > > > Thanks, > > > > Andreas > > > > -- > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.petzold at kit.edu Tue May 30 15:00:27 2017 From: andreas.petzold at kit.edu (Andreas Petzold (SCC)) Date: Tue, 30 May 2017 16:00:27 +0200 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> Message-ID: <45aa5c60-4a79-015a-7236-556b7834714f@kit.edu> Hi Sven, we are seeing FileBlockRandomReadFetchHandlerThread. I'll let you know once I have more results Thanks, Andreas On 05/30/2017 03:55 PM, Sven Oehme wrote: > Hi, > > the very first thing to do would be to do a mmfsadm dump iohist instead > of mmdiag --iohist one time (we actually add this info in next release > to mmdiag --iohist) to see if the thread type will reveal something : > > 07:25:53.578522 W data 1:20260249600 8192 35.930 > 488076 181 C0A70D0A:59076980 cli 192.167.20.129 Prefetch > WritebehindWorkerThread > 07:25:53.632722 W data 1:20260257792 8192 45.179 > 627136 173 C0A70D0A:59076980 cli 192.167.20.129 Cleaner > CleanBufferThread > 07:25:53.662067 W data 2:20259815424 8192 45.612 > 992975086 40 C0A70D0A:59076985 cli 192.167.20.130 Prefetch > WritebehindWorkerThread > 07:25:53.734274 W data 1:19601858560 8 0.624 > 50237 0 C0A70D0A:59076980 cli 192.167.20.129 MBHandler > *_DioHandlerThread_* > > if you see DioHandlerThread most likely somebody changed a openflag to > use O_DIRECT . if you don't use that flag even the app does only 4k i/o > which is inefficient GPFS will detect this and do prefetch writebehind > in large blocks, as soon as you add O_DIRECT, we don't do this anymore > to honor the hint and then every single request gets handled one by one. > > after that the next thing would be to run a very low level trace with > just IO infos like : > > mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N . > > this will start collection on the node you execute the command if you > want to run it against a different node replace the dot at the end with > the hostname . > wait a few seconds and run > > mmtracectl --off > > you will get a message that the trace gets formated and a eventually a > trace file . > now grep for FIO and you get lines like this : > > > 7.293293470 127182 TRACE_IO: FIO: write data tag 1670183 1 ioVecSize > 64 1st buf 0x5C024940000 nsdId C0A71482:5872D94A da 2:51070828544 > nSectors 32768 err 0 > > if you further reduce it to nSectors 8 you would focus only on your 4k > writes you mentioned above. > > the key item in the line above you care about is tag 16... this is the > inode number of your file. > if you now do : > > cd /usr/lpp/mmfs/samples/util ; make > then run (replace -i and filesystem path obviously) > > [root at fire01 util]# ./tsfindinode -i 1670183 /ibm/fs2-16m-09/ > > and you get a hit like this : > > 1670183 0 /ibm/fs2-16m-09//shared/test-newbuf > > you now know the file that is being accessed in the I/O example above is > /ibm/fs2-16m-09//shared/test-newbuf > > hope that helps. > > sven > > > > > On Tue, May 30, 2017 at 6:12 AM Andreas Petzold (SCC) > > wrote: > > Hi John, > > iotop wasn't helpful. It seems to be overwhelmed by what is going on on > the machine. > > Cheers, > > Andreas > > On 05/30/2017 02:28 PM, John Hearns wrote: > > Andreas, > > This is a stupid reply, but please bear with me. > > Not exactly GPFS related, but I once managed an SGI CXFS > (Clustered XFS filesystem) setup. > > We also had a new application which did post-processing One of the > users reported that a post-processing job would take about 30 minutes. > > However when two or more of the same application were running the > job would take several hours. > > > > We finally found that this slowdown was due to the IO size, the > application was using the default size. > > We only found this by stracing the application and spending hours > staring at the trace... > > > > I am sure there are better tools for this, and I do hope you don?t > have to strace every application.... really. > > A good tool to get a general feel for IO pattersn is 'iotop'. It > might help? > > > > > > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at spectrumscale.org > > [mailto:gpfsug-discuss-bounces at spectrumscale.org > ] On Behalf Of > Andreas Petzold (SCC) > > Sent: Tuesday, May 30, 2017 2:17 PM > > To: gpfsug-discuss at spectrumscale.org > > > Subject: [gpfsug-discuss] Associating I/O operations with > files/processes > > > > Dear group, > > > > first a quick introduction: at KIT we are running a 20+PB storage > system with several large (1-9PB) file systems. We have a 14 node > NSD server cluster and 5 small (~10 nodes) protocol node clusters > which each mount one of the file systems. The protocol nodes run > server software (dCache, xrootd) specific to our users which > primarily are the LHC experiments at CERN. GPFS version is 4.2.2 > everywhere. All servers are connected via IB, while the protocol > nodes communicate via Ethernet to their clients. > > > > Now let me describe the problem we are facing. Since a few days, > one of the protocol nodes shows a very strange and as of yet > unexplained I/O behaviour. Before we were usually seeing reads like > this (iohist example from a well behaved node): > > > > 14:03:37.637526 R data 32:138835918848 8192 46.626 > cli 0A417D79:58E3B179 172.18.224.19 > > 14:03:37.660177 R data 18:12590325760 8192 25.498 > cli 0A4179AD:58E3AE66 172.18.224.14 > > 14:03:37.640660 R data 15:106365067264 8192 45.682 > cli 0A4179AD:58E3ADD7 172.18.224.14 > > 14:03:37.657006 R data 35:130482421760 8192 30.872 > cli 0A417DAD:58E3B266 172.18.224.21 > > 14:03:37.643908 R data 33:107847139328 8192 45.571 > cli 0A417DAD:58E3B206 172.18.224.21 > > > > Since a few days we see this on the problematic node: > > > > 14:06:27.253537 R data 46:126258287872 8 15.474 > cli 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.268626 R data 40:137280768624 8 0.395 > cli 0A4179AD:58E3ADE3 172.18.224.14 > > 14:06:27.269056 R data 46:56452781528 8 0.427 > cli 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.269417 R data 47:97273159640 8 0.293 > cli 0A4179AD:58E3AE5A 172.18.224.14 > > 14:06:27.269293 R data 49:59102786168 8 0.425 > cli 0A4179AD:58E3AE72 172.18.224.14 > > 14:06:27.269531 R data 46:142387326944 8 0.340 > cli 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.269377 R data 28:102988517096 8 0.554 > cli 0A417879:58E3AD08 172.18.224.10 > > > > The number of read ops has gone up by O(1000) which is what one > would expect when going from 8192 sector reads to 8 sector reads. > > > > We have already excluded problems of node itself so we are > focusing on the applications running on the node. What we'd like to > to is to associate the I/O requests either with files or specific > processes running on the machine in order to be able to blame the > correct application. Can somebody tell us, if this is possible and > if now, if there are other ways to understand what application is > causing this? > > > > Thanks, > > > > Andreas > > > > -- > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Andreas Petzold Hermann-von-Helmholtz-Platz 1, Building 449, Room 202 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 24916 Fax: +49 721 608 24972 Email: petzold at kit.edu www.scc.kit.edu KIT ? The Research University in the Helmholtz Association Since 2010, KIT has been certified as a family-friendly university. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5323 bytes Desc: S/MIME Cryptographic Signature URL: From makaplan at us.ibm.com Tue May 30 15:39:50 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 30 May 2017 14:39:50 +0000 Subject: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170529160138.18847jpj5x9kz8ki@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca><20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca><20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Message-ID: Regarding mmbackup and TSM INCLUDE/EXCLUDE, I found this doc by googling... http://www-01.ibm.com/support/docview.wss?uid=swg21699569 Which says, among other things and includes many ifs,and,buts : "... include and exclude options are interpreted differently by the IBM Spectrum Scale mmbackup command and by the IBM Spectrum Protect backup-archive client..." I think mmbackup tries to handle usual, sensible, variants of the TSM directives that can be directly "translated" to more logical SQL, so you don't have to follow all the twists, but if it isn't working as you expected... RTFM... OTOH... If you are like or can work with the customize-the-policy-rules approach -- that is good too and makes possible finer grain controls. From: "Jaime Pinto" To: "gpfsug main discussion list" , "Marc A Kaplan" Date: 05/29/2017 04:01 PM Subject: Re: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors Quoting "Marc A Kaplan" : > Easier than hacking mmbackup or writing/editing policy rules, > > mmbackup interprets > your TSM INCLUDE/EXCLUDE configuration statements -- so that is a > supported and recommended way of doing business... Finally got some time to resume testing on this Here is the syntax used (In this test I want to backup /wosgpfs/backmeup only) mmbackup /wosgpfs -N wos-gateway02-ib0 -s /dev/shm --tsm-errorlog $logfile -L 4 As far as I can tell, the EXCLUDE statements in the TSM configuration (dsm.opt) are being *ignored*. I tried a couple of formats: 1) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup" INCLExcl "/sysadmin/BA/ba-wos/bin/inclexcl" 1a) cat /sysadmin/BA/ba-wos/bin/inclexcl exclude.dir /wosgpfs/ignore /wosgpfs/junk /wosgpfs/project 1b) cat /sysadmin/BA/ba-wos/bin/inclexcl exclude.dir /wosgpfs/ignore exclude.dir /wosgpfs/junk exclude.dir /wosgpfs/project 2) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup" exclude.dir /wosgpfs/ignore exclude.dir /wosgpfs/junk exclude.dir /wosgpfs/project 3) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup -/wosgpfs/ignore -/wosgpfs/junk -/wosgpfs/project" In another words, all the contents under /wosgpfs are being traversed and going to the TSM backup. Furthermore, even with "-L 4" mmbackup is not logging the list of files being sent to the TSM backup anywhere on the client side. I only get that information from the TSM server side (get filespace). I know that all contents of /wosgpfs are being traversed because I have a tail on /wosgpfs/.mmbackupShadow.1.TAPENODE3.filesys.update > > If that doesn't do it for your purposes... You're into some light > hacking... So look inside the mmbackup and tsbackup33 scripts and you'll > find some DEBUG variables that should allow for keeping work and temp > files around ... including the generated policy rules. > I'm calling this hacking "light", because I don't think you'll need to > change the scripts, but just look around and see how you can use what's > there to achieve your legitimate purposes. Even so, you will have crossed > a line where IBM support is "informal" at best. On the other hand I am having better luck with the customer rules file. The modified template below will traverse only the /wosgpfs/backmeup, as intended, and only backup files modified under that path. I guess I have a working solution that I will try at scale now. [root at wos-gateway02 bin]# cat dsm.opt SERVERNAME TAPENODE3 ARCHSYMLINKASFILE NO DOMAIN "/wosgpfs/backmeup" __________________________________________________________ /* Auto-generated GPFS policy rules file * Generated on Wed May 24 12:12:51 2017 */ /* Server rules for backup server 1 *** TAPENODE3 *** */ RULE EXTERNAL LIST 'mmbackup.1.TAPENODE3' EXEC '/wosgpfs/.mmbackupCfg/BAexecScript.wosgpfs' OPTS '"/wosgpfs/.mmbackupShadow.1.TAPENODE3.filesys.update" "-servername=TAPENODE3" "-auditlogname=/wosgpfs/mmbackup.audit.wosgpfs.TAPENODE3" "NONE"' RULE 'BackupRule' LIST 'mmbackup.1.TAPENODE3' DIRECTORIES_PLUS SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' ELSE 'resdnt' END )) WHERE ( NOT ( (PATH_NAME LIKE '/%/.mmbackup%') OR (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.mmLockDir/%') OR (PATH_NAME LIKE '/%/.g2w/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/ignore/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/junk/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/project/%') OR /* DO NOT TRAVERSE OR BACKUP */ (MODE LIKE 's%') ) ) AND (PATH_NAME LIKE '/%/backmeup/%') /* TRAVERSE AND BACKUP */ AND (MISC_ATTRIBUTES LIKE '%u%') AND ( NOT ( (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.SpaceMan/%') ) ) AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) _________________________________________________________ [root at wos-gateway02 bin]# time ./mmbackup-wos.sh -------------------------------------------------------- mmbackup: Backup of /wosgpfs begins at Mon May 29 15:54:47 EDT 2017. -------------------------------------------------------- Mon May 29 15:54:49 2017 mmbackup:using user supplied policy rules: /sysadmin/BA/ba-wos/bin/mmbackupRules.wosgpfs Mon May 29 15:54:49 2017 mmbackup:Scanning file system wosgpfs Mon May 29 15:54:52 2017 mmbackup:Determining file system changes for wosgpfs [TAPENODE3]. Mon May 29 15:54:52 2017 mmbackup:changed=3, expired=0, unsupported=0 for server [TAPENODE3] Mon May 29 15:54:52 2017 mmbackup:Sending files to the TSM server [3 changed, 0 expired]. mmbackup: TSM Summary Information: Total number of objects inspected: 3 Total number of objects backed up: 3 Total number of objects updated: 0 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 0 Total number of objects failed: 0 Total number of objects encrypted: 0 Total number of bytes inspected: 4096 Total number of bytes transferred: 512 ---------------------------------------------------------- mmbackup: Backup of /wosgpfs completed successfully at Mon May 29 15:54:56 EDT 2017. ---------------------------------------------------------- real 0m9.276s user 0m2.906s sys 0m3.212s _________________________________________________________ Thanks for all the help Jaime > > > > > From: Jez Tucker > To: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 03:33 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > When mmbackup has passed the preflight stage (pretty quickly) you'll > find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* > > Best, > > Jez > > > On 18/05/17 20:02, Jaime Pinto wrote: > Ok Mark > > I'll follow your option 2) suggestion, and capture what mmbackup is using > as a rule first, then modify it. > > I imagine by 'capture' you are referring to the -L n level I use? > > -L n > Controls the level of information displayed by the > mmbackup command. Larger values indicate the > display of more detailed information. n should be one of > the following values: > > 3 > Displays the same information as 2, plus each > candidate file and the applicable rule. > > 4 > Displays the same information as 3, plus each > explicitly EXCLUDEed or LISTed > file, and the applicable rule. > > 5 > Displays the same information as 4, plus the > attributes of candidate and EXCLUDEed or > LISTed files. > > 6 > Displays the same information as 5, plus > non-candidate files and their attributes. > > Thanks > Jaime > > > > > Quoting "Marc A Kaplan" : > > 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup > wants to support incremental backups (using what it calls its shadow > database) and keep both your sanity and its sanity -- so mmbackup limits > you to either full filesystem or full inode-space (independent fileset.) > If you want to do something else, okay, but you have to be careful and be > sure of yourself. IBM will not be able to jump in and help you if and when > > it comes time to restore and you discover that your backup(s) were not > complete. > > 2. If you decide you're a big boy (or woman or XXX) and want to do some > hacking ... Fine... But even then, I suggest you do the smallest hack > that will mostly achieve your goal... > DO NOT think you can create a custom policy rules list for mmbackup out of > > thin air.... Capture the rules mmbackup creates and make small changes to > > that -- > And as with any disaster recovery plan..... Plan your Test and Test your > > Plan.... Then do some dry run recoveries before you really "need" to do a > > real recovery. > > I only even sugest this because Jaime says he has a huge filesystem with > several dependent filesets and he really, really wants to do a partial > backup, without first copying or re-organizing the filesets. > > HMMM.... otoh... if you have one or more dependent filesets that are > smallish, and/or you don't need the backups -- create independent > filesets, copy/move/delete the data, rename, voila. > > > > From: "Jaime Pinto" > To: "Marc A Kaplan" > Cc: "gpfsug main discussion list" > Date: 05/18/2017 12:36 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Marc > > The -P option may be a very good workaround, but I still have to test it. > > I'm currently trying to craft the mm rule, as minimalist as possible, > however I'm not sure about what attributes mmbackup expects to see. > > Below is my first attempt. It would be nice to get comments from > somebody familiar with the inner works of mmbackup. > > Thanks > Jaime > > > /* A macro to abbreviate VARCHAR */ > define([vc],[VARCHAR($1)]) > > /* Define three external lists */ > RULE EXTERNAL LIST 'allfiles' EXEC > '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' > > /* Generate a list of all files, directories, plus all other file > system objects, > like symlinks, named pipes, etc. Include the owner's id with each > object and > sort them by the owner's id */ > > RULE 'r1' LIST 'allfiles' > DIRECTORIES_PLUS > SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || > vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) > FROM POOL 'system' > FOR FILESET('sysadmin3') > > /* Files in special filesets, such as those excluded, are never traversed > */ > RULE 'ExcSpecialFile' EXCLUDE > FOR FILESET('scratch3','project3') > > > > > > Quoting "Marc A Kaplan" : > > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that > the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup > for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset .... > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support > team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused > and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such > thing. > Filesets are kinda like little filesystems within filesystems. Moving > a > file from one fileset to another requires a copy operation. There is > no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject: Re: [gpfsug-discuss] What is an independent fileset? > was: > mmbackup with fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always > think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that > are > in a separable range of inode numbers - this allows GPFS to efficiently > do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be > represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset.... If you didn't say otherwise, inodes come from the > default "root" fileset.... > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned.... > > BTW - if you look at the bits of the inode numbers carefully --- you > may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope > errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots > of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the > only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor > ESS, > so anyone in this list feel free to give feedback on that page people > with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a > new > independent fileset (multiple ways to do that) or use the TSM client > config. > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Jaime Pinto" > Cc: > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Date: Thu, May 18, 2017 4:43 AM > > There is hope. See reference link below: > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > > > > > > The issue has to do with dependent vs. independent filesets, something > I didn't even realize existed until now. Our filesets are dependent > (for no particular reason), so I have to find a way to turn them into > independent. > > The proper option syntax is "--scope inodespace", and the error > message actually flagged that out, however I didn't know how to > interpret what I saw: > > > # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > -------------------------------------------------------- > mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 > 21:27:43 EDT 2017. > -------------------------------------------------------- > Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* > fileset sysadmin3 is not supported > Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > -------------------------------------------------------- > > Will post the outcome. > Jaime > > > > Quoting "Jaime Pinto" : > > Quoting "Luis Bolinches" : > > Hi > > have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as > inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, > and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > > > Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is > linked > on /IBM/GPFS/FSET1 > > dsm.sys > ... > > DOMAIN /IBM/GPFS > EXCLUDE.DIR /IBM/GPFS/FSET1 > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > > Date: 17-05-17 23:44 > Subject: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: > * project3 > * scratch3 > * sysadmin3 > > I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we > have no need or space to include *scratch3* on TSM. > > Question: how to craft the mmbackup command to backup > /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? > > Below are 3 types of errors: > > 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > > ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up > dependent fileset sysadmin3 is not supported > Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > > 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope filesystem --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > These examples don't really cover my case: > > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > > > > > > > Thanks > Jaime > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Jez Tucker > Head of Research and Development, Pixit Media > 07764193820 | jtucker at pixitmedia.com > www.pixitmedia.com | Tw:@pixitmedia.com > > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email._______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue May 30 16:15:11 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 30 May 2017 11:15:11 -0400 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> Message-ID: In version 4.2.3 you can turn on QOS --fine-stats and --pid-stats and get IO operations statistics for each active process on each node. https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmchqos.htm https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmlsqos.htm The statistics allow you to distinguish single sector IOPS from partial block multisector iops from full block multisector iops. Notice that to use this feature you must enable QOS, but by default you start by running with all throttles set at "unlimited". There are some overheads, so you might want to use it only when you need to find the "bad" processes. It's a little tricky to use effectively, but we give you a sample script that shows some ways to produce, massage and filter the raw data: samples/charts/qosplotfine.pl The data is available in a CSV format, so it's easy to feed into spreadsheets or data bases and crunch... --marc of GPFS. From: "Andreas Petzold (SCC)" To: Date: 05/30/2017 08:17 AM Subject: [gpfsug-discuss] Associating I/O operations with files/processes Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear group, first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 Since a few days we see this on the problematic node: 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? Thanks, Andreas -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Andreas Petzold Hermann-von-Helmholtz-Platz 1, Building 449, Room 202 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 24916 Fax: +49 721 608 24972 Email: petzold at kit.edu www.scc.kit.edu KIT ? The Research University in the Helmholtz Association Since 2010, KIT has been certified as a family-friendly university. [attachment "smime.p7s" deleted by Marc A Kaplan/Watson/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tomasz.Wolski at ts.fujitsu.com Wed May 31 10:33:29 2017 From: Tomasz.Wolski at ts.fujitsu.com (Tomasz.Wolski at ts.fujitsu.com) Date: Wed, 31 May 2017 09:33:29 +0000 Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 In-Reply-To: References: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> Message-ID: <5564b22a89744e06ad7003607248f279@R01UKEXCASM223.r01.fujitsu.local> Thank you very much - that?s very helpful and will save us a lot of effort :) Best regards, Tomasz Wolski From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Achim Rehor Sent: Tuesday, May 30, 2017 9:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 The statement was always to be release n-1 compatible. with release being (VRMF) 4.2.3.0 so all 4.2 release levels ought to be compatible with all 4.1 levels. As Felipe pointed out below, the mmchconfig RELEASE=latest will not touch the filesystem level. And if you are running remote clusters, you need to be aware, that lifting a filesystem to the latest level (mmchfs -V full) you will loose remote clusters mount ability if they are on a lower level. in these cases use the -V compat flag (and see commands refernce for details) Mit freundlichen Gr??en / Kind regards Achim Rehor ________________________________ Software Technical Support Specialist AIX/ Emea HPC Support [cid:image001.gif at 01D2DA01.B94BC9E0] IBM Certified Advanced Technical Expert - Power Systems with AIX TSCC Software Service, Dept. 7922 Global Technology Services ________________________________ Phone: +49-7034-274-7862 IBM Deutschland E-Mail: Achim.Rehor at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Reinhard Reschke, Dieter Scholz, Gregor Pillen, Ivo Koerner, Christian Noll Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940 From: "Felipe Knop" > To: gpfsug main discussion list > Date: 05/30/2017 04:54 AM Subject: Re: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Tomasz, The statement below from "Concepts, Planning and Installation Guide" was found to be incorrect and is being withdrawn from the publications. The team is currently working on improvements to the guidance being provided for migration. For a cluster which is not running protocols like NFS/SMB/Object, migration of nodes one-at-a-time from 4.1.1 to 4.2.3 should work. Once all nodes are migrated to 4.2.3, command mmchconfig release=LATEST can be issued to move the cluster to the 4.2.3 level. Note that the command above will not change the file system level. The file system can be moved to the latest level with command mmchfs file-system-name -V full In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? That is expected to work. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Tomasz.Wolski at ts.fujitsu.com" > To: "gpfsug-discuss at spectrumscale.org" > Date: 05/29/2017 04:24 PM Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, We are planning to integrate new IBM Spectrum Scale version 4.2.3 into our software, but in our current software release we have version 4.1.1 integrated. We are worried how would node-at-a-time updates look like when our customer wanted to update his cluster from 4.1.1 to 4.2.3 version. According to ?Concepts, Planning and Installation Guide? (for 4.2.3), there?s a limited compatibility between two GPFS versions and if they?re not adjacent, then following update path is advised: ?If you want to migrate to an IBM Spectrum Scale version that is not an adjacent release of your current version (for example, version 4.1.1.x to version 4.2.3), you can follow this migration path depending on your current version: V 3.5 > V 4.1.0.x > V 4.1.1.x > V 4.2.0.x > V 4.2.1.x > V 4.2.2.x > V 4.2.3.x? My question is: is the above statement true even though on nodes where new GPFS 4.2.3 is installed these nodes will not be migrated to latest release with ?mmchconfig release=LATEST? until all nodes in the cluster will have been updated to version 4.2.3? In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? Best regards, Tomasz Wolski With best regards / Mit freundlichen Gr??en / Pozdrawiam Tomasz Wolski Development Engineer NDC Eternus CS HE (ET2) [cid:image002.gif at 01CE62B9.8ACFA960] FUJITSU Fujitsu Technology Solutions Sp. z o.o. Textorial Park Bldg C, ul. Fabryczna 17 90-344 Lodz, Poland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 7182 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.gif Type: image/gif Size: 2774 bytes Desc: image002.gif URL: From Tomasz.Wolski at ts.fujitsu.com Wed May 31 11:00:02 2017 From: Tomasz.Wolski at ts.fujitsu.com (Tomasz.Wolski at ts.fujitsu.com) Date: Wed, 31 May 2017 10:00:02 +0000 Subject: [gpfsug-discuss] Missing gpfs filesystem device under /dev/ Message-ID: <8b209bc526024c49a4a002608f354b3c@R01UKEXCASM223.r01.fujitsu.local> Hello All, It seems that GPFS 4.2.3 does not create block device under /dev for new filesystems anymore - is this behavior intended? In manuals, there's nothing mentioned about this change. For example, having GPFS filesystem gpfs100 with mountpoint /cache/100, /proc/mounts has following entry: gpfs100 /cache/100 gpfs rw,relatime 0 0 where in older releases it used to be /dev/gpfs100 /cache/100 gpfs rw,relatime 0 0 Is there any option (i.e. supplied for mmcrfs) to have these device in /dev/ still in version 4.2.3? With best regards / Mit freundlichen Gr??en / Pozdrawiam Tomasz Wolski Development Engineer NDC Eternus CS HE (ET2) [cid:image002.gif at 01CE62B9.8ACFA960] FUJITSU Fujitsu Technology Solutions Sp. z o.o. Textorial Park Bldg C, ul. Fabryczna 17 90-344 Lodz, Poland -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2774 bytes Desc: image001.gif URL: From Robert.Oesterlin at nuance.com Wed May 31 12:13:01 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 31 May 2017 11:13:01 +0000 Subject: [gpfsug-discuss] Missing gpfs filesystem device under /dev/ Message-ID: <48568693-5E00-4637-9AEE-BA63CC8DC47A@nuance.com> This was a documented change back in (I think) GPFS 4.2.0, but I?d have to go back over the old release notes. It can?t be changed. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Tomasz.Wolski at ts.fujitsu.com" Reply-To: gpfsug main discussion list Date: Wednesday, May 31, 2017 at 5:00 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Missing gpfs filesystem device under /dev/ It seems that GPFS 4.2.3 does not create block device under /dev for new filesystems anymore - is this behavior intended? In manuals, there?s nothing mentioned about this change. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed May 31 12:25:13 2017 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 31 May 2017 07:25:13 -0400 Subject: [gpfsug-discuss] Missing gpfs filesystem device under /dev/ In-Reply-To: <48568693-5E00-4637-9AEE-BA63CC8DC47A@nuance.com> References: <48568693-5E00-4637-9AEE-BA63CC8DC47A@nuance.com> Message-ID: The change actually occurred in 4.2.1 to better integrate GPFS with systemd on RHEL 7.x. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 05/31/2017 07:13 AM Subject: Re: [gpfsug-discuss] Missing gpfs filesystem device under /dev/ Sent by: gpfsug-discuss-bounces at spectrumscale.org This was a documented change back in (I think) GPFS 4.2.0, but I?d have to go back over the old release notes. It can?t be changed. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Tomasz.Wolski at ts.fujitsu.com" Reply-To: gpfsug main discussion list Date: Wednesday, May 31, 2017 at 5:00 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Missing gpfs filesystem device under /dev/ It seems that GPFS 4.2.3 does not create block device under /dev for new filesystems anymore - is this behavior intended? In manuals, there?s nothing mentioned about this change. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue May 2 01:24:44 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 2 May 2017 00:24:44 +0000 Subject: [gpfsug-discuss] Supercomputing Hotels 2017 Hotels - Reserve Early! Message-ID: <7ED8704A-698A-4109-B843-EB6E8FF07478@nuance.com> Hotel reservations for the Supercomputing conference opened today, and the rooms are filling up VERY fast. My advice to everyone is that if you are at all considering going - reserve now. You can do so at no charge and can cancel for free up till mid-October. Cheap and close hotels already have some dates filled up. http://sc17.supercomputing.org/attendees/attendee-housing/ Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue May 2 10:58:02 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Tue, 2 May 2017 09:58:02 +0000 Subject: [gpfsug-discuss] Meet other spectrum scale users in May In-Reply-To: <1f483faa9cb61dcdc80afb187e908745@webmail.gpfsug.org> References: <1f483faa9cb61dcdc80afb187e908745@webmail.gpfsug.org> Message-ID: Hi All, Just to note that we need to send final numbers of the venue today for lunches etc, so if you are planning to attend, please register NOW! (otherwise you might not get lunch/entry to the evening event) Thanks Simon From: > on behalf of Secretary GPFS UG > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 27 April 2017 at 09:29 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Meet other spectrum scale users in May Dear Members, Please join us and other spectrum scale users for 2 days of great talks and networking! When: 9-10th May 2017 Where: Macdonald Manchester Hotel & Spa, Manchester, UK (right by Manchester Piccadilly train station) Who? The event is free to attend, is open to members from all industries and welcomes users with a little and a lot of experience using Spectrum Scale. The SSUG brings together the Spectrum Scale User Community including Spectrum Scale developers and architects to share knowledge, experiences and future plans. Topics include transparent cloud tiering, AFM, automation and security best practices, Docker and HDFS support, problem determination, and an update on Elastic Storage Server (ESS). Our popular forum includes interactive problem solving, a best practices discussion and networking. We're very excited to welcome back Doris Conti the Director for Spectrum Scale (GPFS) and HPC SW Product Development at IBM. The May meeting is sponsored by IBM, DDN, Lenovo, Mellanox, Seagate, Arcastream, Ellexus, and OCF. It is an excellent opportunity to learn more and get your questions answered. Register your place today at the Eventbrite page https://goo.gl/tRptru We hope to see you there! -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue May 2 21:21:42 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 2 May 2017 15:21:42 -0500 Subject: [gpfsug-discuss] AFM Message-ID: Hello all, Is there any way to rate limit the AFM traffic? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From scale at us.ibm.com Wed May 3 02:37:52 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 2 May 2017 21:37:52 -0400 Subject: [gpfsug-discuss] AFM In-Reply-To: References: Message-ID: Not that I am aware and QoS is not supported with any of the AFM traffic. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Matt Weil To: gpfsug main discussion list Date: 05/02/2017 04:22 PM Subject: [gpfsug-discuss] AFM Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello all, Is there any way to rate limit the AFM traffic? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed May 3 03:20:24 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 3 May 2017 02:20:24 +0000 Subject: [gpfsug-discuss] AFM In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From martinsworkmachine at gmail.com Wed May 3 13:29:43 2017 From: martinsworkmachine at gmail.com (J Martin Rushton) Date: Wed, 3 May 2017 13:29:43 +0100 Subject: [gpfsug-discuss] Introduction Message-ID: Hi All As requested here is a brief introduction. I run a small cluster of 41 Linux nodes and we use GPFS for the user filesystems, user applications and a bunch of stuff in /opt. Backup/Archive is by Tivoli. Most user work is batch, with run times up to a couple of months (which makes updates a problem at times). I'm based near Sevenoaks in Kent, England. Regards, Martin From SAnderson at convergeone.com Wed May 3 18:08:36 2017 From: SAnderson at convergeone.com (Shaun Anderson) Date: Wed, 3 May 2017 17:08:36 +0000 Subject: [gpfsug-discuss] Tiebreaker disk question Message-ID: <1493831316163.52984@convergeone.com> We noticed some odd behavior recently. I have a customer with a small Scale (with Archive on top) configuration that we recently updated to a dual node configuration. We are using CES and setup a very small 3 nsd shared-root filesystem(gpfssr). We also set up tiebreaker disks and figured it would be ok to use the gpfssr NSDs for this purpose. When we tried to perform some basic failover testing, both nodes came down. It appears from the logs that when we initiated the node failure (via mmshutdown command...not great, I know) it unmounts and remounts the shared-root filesystem. When it did this, the cluster lost access to the tiebreaker disks, figured it had lost quorum and the other node came down as well. We got around this by changing the tiebreaker disks to our other normal gpfs filesystem. After that failover worked as expected. This is documented nowhere as far as I could find?. I wanted to know if anybody else had experienced this and if this is expected behavior. All is well now and operating as we want so I don't think we'll pursue a support request. Regards, SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments here to may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Thu May 4 06:27:11 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Thu, 04 May 2017 05:27:11 +0000 Subject: [gpfsug-discuss] Tiebreaker disk question In-Reply-To: <1493831316163.52984@convergeone.com> References: <1493831316163.52984@convergeone.com> Message-ID: This doesn't sound like normal behaviour. It shouldn't matter which filesystem your tiebreaker disks belong to. I think the failure was caused by something else, but am not able to guess from the little information you posted.. The mmfs.log will probably tell you the reason. -jf ons. 3. mai 2017 kl. 19.08 skrev Shaun Anderson : > We noticed some odd behavior recently. I have a customer with a small > Scale (with Archive on top) configuration that we recently updated to a > dual node configuration. We are using CES and setup a very small 3 > nsd shared-root filesystem(gpfssr). We also set up tiebreaker disks and > figured it would be ok to use the gpfssr NSDs for this purpose. > > > When we tried to perform some basic failover testing, both nodes came > down. It appears from the logs that when we initiated the node failure > (via mmshutdown command...not great, I know) it unmounts and remounts the > shared-root filesystem. When it did this, the cluster lost access to the > tiebreaker disks, figured it had lost quorum and the other node came down > as well. > > > We got around this by changing the tiebreaker disks to our other normal > gpfs filesystem. After that failover worked as expected. This is > documented nowhere as far as I could find?. I wanted to know if anybody > else had experienced this and if this is expected behavior. All is well > now and operating as we want so I don't think we'll pursue a support > request. > > > Regards, > > *SHAUN ANDERSON* > STORAGE ARCHITECT > O 208.577.2112 > M 214.263.7014 > > > NOTICE: This email message and any attachments here to may contain > confidential > information. Any unauthorized review, use, disclosure, or distribution of > such > information is prohibited. If you are not the intended recipient, please > contact > the sender by reply email and destroy the original message and all copies > of it. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu May 4 08:56:09 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 4 May 2017 09:56:09 +0200 Subject: [gpfsug-discuss] Tiebreaker disk question In-Reply-To: References: <1493831316163.52984@convergeone.com> Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu May 4 14:15:40 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 4 May 2017 13:15:40 +0000 Subject: [gpfsug-discuss] HAWC question Message-ID: Hi, I have a question about HAWC, we are trying to enable this for our OpenStack environment, system pool is on SSD already, so we try to change the log file size with: mmchfs FSNAME -L 128M This says: mmchfs: Attention: You must restart the GPFS daemons before the new log file size takes effect. The GPFS daemons can be restarted one node at a time. When the GPFS daemon is restarted on the last node in the cluster, the new log size becomes effective. We multi-cluster the file-system, so do we have to restart every node in all clusters, or just in the storage cluster? And how do we tell once it has become active? Thanks Simon From kenneth.waegeman at ugent.be Thu May 4 14:22:25 2017 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Thu, 4 May 2017 15:22:25 +0200 Subject: [gpfsug-discuss] bizarre performance behavior In-Reply-To: References: <2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov> <4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be> <67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov> <9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be> <7f7349c9-bdd3-5847-1cca-d98d221489fe@ugent.be> Message-ID: Hi, We found out using ib_read_bw and ib_write_bw that there were some links between server and clients degraded, having a bandwith of 350MB/s strangely, nsdperf did not report the same. It reported 12GB/s write and 9GB/s read, which was much more then we actually could achieve. So the problem was some bad ib routing. We changed some ib links, and then we got also 12GB/s read with nsdperf. On our clients we then are able to achieve the 7,2GB/s in total we also saw using the nsd servers! Many thanks for the help !! We are now running some tests with different blocksizes and parameters, because our backend storage is able to do more than the 7.2GB/s we get with GPFS (more like 14GB/s in total). I guess prefetchthreads and nsdworkerthreads are the ones to look at? Cheers! Kenneth On 21/04/17 22:27, Kumaran Rajaram wrote: > Hi Kenneth, > > As it was mentioned earlier, it will be good to first verify the raw > network performance between the NSD client and NSD server using the > nsdperf tool that is built with RDMA support. > g++ -O2 -DRDMA -o nsdperf -lpthread -lrt -libverbs -lrdmacm nsdperf.C > > In addition, since you have 2 x NSD servers it will be good to perform > NSD client file-system performance test with just single NSD server > (mmshutdown the other server, assuming all the NSDs have primary, > server NSD server configured + Quorum will be intact when a NSD server > is brought down) to see if it helps to improve the read performance + > if there are variations in the file-system read bandwidth results > between NSD_server#1 'active' vs. NSD_server #2 'active' (with other > NSD server in GPFS "down" state). If there is significant variation, > it can help to isolate the issue to particular NSD server (HW or IB > issue?). > > You can issue "mmdiag --waiters" on NSD client as well as NSD servers > during your dd test, to verify if there are unsual long GPFS waiters. > In addition, you may issue Linux "perf top -z" command on the GPFS > node to see if there is high CPU usage by any particular call/event > (for e.g., If GPFS config parameter verbsRdmaMaxSendBytes has been > set to low value from the default 16M, then it can cause RDMA > completion threads to go CPU bound ). Please verify some performance > scenarios detailed in Chapter 22 in Spectrum Scale Problem > Determination Guide (link below). > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/pdf/scale_pdg.pdf?view=kc > > Thanks, > -Kums > > > > > > From: Kenneth Waegeman > To: gpfsug main discussion list > Date: 04/21/2017 11:43 AM > Subject: Re: [gpfsug-discuss] bizarre performance behavior > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Hi, > > We already verified this on our nsds: > > [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --QpiSpeed > QpiSpeed=maxdatarate > [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --turbomode > turbomode=enable > [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg ?-SysProfile > SysProfile=perfoptimized > > so sadly this is not the issue. > > Also the output of the verbs commands look ok, there are connections > from the client to the nsds are there is data being read and writen. > > Thanks again! > > Kenneth > > > On 21/04/17 16:01, Kumaran Rajaram wrote: > Hi, > > Try enabling the following in the BIOS of the NSD servers (screen > shots below) > > * Turbo Mode - Enable > * QPI Link Frequency - Max Performance > * Operating Mode - Maximum Performance > * >>>>While we have even better performance with sequential reads on > raw storage LUNS, using GPFS we can only reach 1GB/s in total > (each nsd server seems limited by 0,5GB/s) independent of the > number of clients > > >>We are testing from 2 testing machines connected to the nsds > with infiniband, verbs enabled. > > > Also, It will be good to verify that all the GPFS nodes have Verbs > RDMA started using "mmfsadm test verbs status" and that the NSD > client-server communication from client to server during "dd" is > actually using Verbs RDMA using "mmfsadm test verbs conn" command (on > NSD client doing dd). If not, then GPFS might be using TCP/IP network > over which the cluster is configured impacting performance (If this is > the case, GPFS mmfs.log.latest for any Verbs RDMA related errors and > resolve). > > * > > > > > > > Regards, > -Kums > > > > > > > From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > __ > To: gpfsug main discussion list __ > > Date: 04/21/2017 09:11 AM > Subject: Re: [gpfsug-discuss] bizarre performance behavior > Sent by: _gpfsug-discuss-bounces at spectrumscale.org_ > > ------------------------------------------------------------------------ > > > > Fantastic news! It might also be worth running "cpupower monitor" or > "turbostat" on your NSD servers while you're running dd tests from the > clients to see what CPU frequency your cores are actually running at. > > A typical NSD server workload (especially with IB verbs and for reads) > can be pretty light on CPU which might not prompt your CPU crew > governor to up the frequency (which can affect throughout). If your > frequency scaling governor isn't kicking up the frequency of your CPUs > I've seen that cause this behavior in my testing. > > -Aaron > > > > > On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman > __ wrote: > > Hi, > > We are running a test setup with 2 NSD Servers backed by 4 Dell > Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of > the 4 powervaults, nsd02 is primary serving LUNS of controller B. > > We are testing from 2 testing machines connected to the nsds with > infiniband, verbs enabled. > > When we do dd from the NSD servers, we see indeed performance going to > 5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is > able to get the data at a decent speed. Since we can write from the > clients at a good speed, I didn't suspect the communication between > clients and nsds being the issue, especially since total performance > stays the same using 1 or multiple clients. > > I'll use the nsdperf tool to see if we can find anything, > > thanks! > > K > > On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE > CORP] wrote: > Interesting. Could you share a little more about your architecture? Is > it possible to mount the fs on an NSD server and do some dd's from the > fs on the NSD server? If that gives you decent performance perhaps try > NSDPERF next > _https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf_ > > > -Aaron > > > > > On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman > __ wrote: > > Hi, > > Having an issue that looks the same as this one: > > We can do sequential writes to the filesystem at 7,8 GB/s total , > which is the expected speed for our current storage > backend. While we have even better performance with sequential reads > on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each > nsd server seems limited by 0,5GB/s) independent of the number of clients > (1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, > MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed > in this thread, but nothing seems to impact this read performance. > > Any ideas? > > Thanks! > > Kenneth > > On 17/02/17 19:29, Jan-Frode Myklebust wrote: > I just had a similar experience from a sandisk infiniflash system > SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for > writes. and 250-300 Mbyte/s on sequential reads!! Random reads were on > the order of 2 Gbyte/s. > > After a bit head scratching snd fumbling around I found out that > reducing maxMBpS from 10000 to 100 fixed the problem! Digging further > I found that reducing prefetchThreads from default=72 to 32 also fixed > it, while leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s. > > Could something like this be the problem on your box as well? > > > > -jf > fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister > <_aaron.s.knister at nasa.gov_ >: > Well, I'm somewhat scrounging for hardware. This is in our test > environment :) And yep, it's got the 2U gpu-tray in it although even > without the riser it has 2 PCIe slots onboard (excluding the on-board > dual-port mezz card) so I think it would make a fine NSD server even > without the riser. > > -Aaron > > On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: > > Maybe its related to interrupt handlers somehow? You drive the load > up on one socket, you push all the interrupt handling to the other > socket where the fabric card is attached? > > > > Dunno ... (Though I am intrigued you use idataplex nodes as NSD > servers, I assume its some 2U gpu-tray riser one or something !) > > > > Simon > > ________________________________________ > > From: _gpfsug-discuss-bounces at spectrumscale.org_ > [_gpfsug-discuss-bounces at spectrumscale.org_ > ] on behalf of Aaron > Knister [_aaron.s.knister at nasa.gov_ ] > > Sent: 17 February 2017 15:52 > > To: gpfsug main discussion list > > Subject: [gpfsug-discuss] bizarre performance behavior > > > > This is a good one. I've got an NSD server with 4x 16GB fibre > > connections coming in and 1x FDR10 and 1x QDR connection going out to > > the clients. I was having a really hard time getting anything resembling > > sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for > > reads). The back-end is a DDN SFA12K and I *know* it can do better than > > that. > > > > I don't remember quite how I figured this out but simply by running > > "openssl speed -multi 16" on the nsd server to drive up the load I saw > > an almost 4x performance jump which is pretty much goes against every > > sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to > > quadruple your i/o performance"). > > > > This feels like some type of C-states frequency scaling shenanigans that > > I haven't quite ironed down yet. I booted the box with the following > > kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which > > didn't seem to make much of a difference. I also tried setting the > > frequency governer to userspace and setting the minimum frequency to > > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have > > to run something to drive up the CPU load and then performance improves. > > > > I'm wondering if this could be an issue with the C1E state? I'm curious > > if anyone has seen anything like this. The node is a dx360 M4 > > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM. > > > > -Aaron > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at _spectrumscale.org_ > > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at _spectrumscale.org_ > > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at _spectrumscale.org_ _ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 61023 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 85131 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 84819 bytes Desc: not available URL: From oehmes at gmail.com Thu May 4 14:28:20 2017 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 04 May 2017 13:28:20 +0000 Subject: [gpfsug-discuss] HAWC question In-Reply-To: References: Message-ID: well, it's a bit complicated which is why the message is there in the first place. reason is, there is no easy way to tell except by dumping the stripgroup on the filesystem manager and check what log group your particular node is assigned to and then check the size of the log group. as soon as the client node gets restarted it should in most cases pick up a new log group and that should be at the new size, but to be 100% sure we say all nodes need to be restarted. you need to also turn HAWC on as well, i assume you just left this out of the email , just changing log size doesn't turn it on :-) On Thu, May 4, 2017 at 6:15 AM Simon Thompson (IT Research Support) < S.J.Thompson at bham.ac.uk> wrote: > Hi, > > I have a question about HAWC, we are trying to enable this for our > OpenStack environment, system pool is on SSD already, so we try to change > the log file size with: > > mmchfs FSNAME -L 128M > > This says: > > mmchfs: Attention: You must restart the GPFS daemons before the new log > file > size takes effect. The GPFS daemons can be restarted one node at a time. > When the GPFS daemon is restarted on the last node in the cluster, the new > log size becomes effective. > > > We multi-cluster the file-system, so do we have to restart every node in > all clusters, or just in the storage cluster? > > And how do we tell once it has become active? > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu May 4 14:39:33 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 4 May 2017 13:39:33 +0000 Subject: [gpfsug-discuss] HAWC question In-Reply-To: References: , Message-ID: Which cluster though? The client and storage are separate clusters, so all the nodes on the remote cluster or storage cluster? Thanks Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of oehmes at gmail.com [oehmes at gmail.com] Sent: 04 May 2017 14:28 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] HAWC question well, it's a bit complicated which is why the message is there in the first place. reason is, there is no easy way to tell except by dumping the stripgroup on the filesystem manager and check what log group your particular node is assigned to and then check the size of the log group. as soon as the client node gets restarted it should in most cases pick up a new log group and that should be at the new size, but to be 100% sure we say all nodes need to be restarted. you need to also turn HAWC on as well, i assume you just left this out of the email , just changing log size doesn't turn it on :-) On Thu, May 4, 2017 at 6:15 AM Simon Thompson (IT Research Support) > wrote: Hi, I have a question about HAWC, we are trying to enable this for our OpenStack environment, system pool is on SSD already, so we try to change the log file size with: mmchfs FSNAME -L 128M This says: mmchfs: Attention: You must restart the GPFS daemons before the new log file size takes effect. The GPFS daemons can be restarted one node at a time. When the GPFS daemon is restarted on the last node in the cluster, the new log size becomes effective. We multi-cluster the file-system, so do we have to restart every node in all clusters, or just in the storage cluster? And how do we tell once it has become active? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Thu May 4 15:06:10 2017 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 04 May 2017 14:06:10 +0000 Subject: [gpfsug-discuss] HAWC question In-Reply-To: References: Message-ID: let me clarify and get back, i am not 100% sure on a cross cluster , i think the main point was that the FS manager for that fs should be reassigned (which could also happen via mmchmgr) and then the individual clients that mount that fs restarted , but i will double check and reply later . On Thu, May 4, 2017 at 6:39 AM Simon Thompson (IT Research Support) < S.J.Thompson at bham.ac.uk> wrote: > Which cluster though? The client and storage are separate clusters, so all > the nodes on the remote cluster or storage cluster? > > Thanks > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [ > gpfsug-discuss-bounces at spectrumscale.org] on behalf of oehmes at gmail.com [ > oehmes at gmail.com] > Sent: 04 May 2017 14:28 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] HAWC question > > well, it's a bit complicated which is why the message is there in the > first place. > > reason is, there is no easy way to tell except by dumping the stripgroup > on the filesystem manager and check what log group your particular node is > assigned to and then check the size of the log group. > > as soon as the client node gets restarted it should in most cases pick up > a new log group and that should be at the new size, but to be 100% sure we > say all nodes need to be restarted. > > you need to also turn HAWC on as well, i assume you just left this out of > the email , just changing log size doesn't turn it on :-) > > On Thu, May 4, 2017 at 6:15 AM Simon Thompson (IT Research Support) < > S.J.Thompson at bham.ac.uk> wrote: > Hi, > > I have a question about HAWC, we are trying to enable this for our > OpenStack environment, system pool is on SSD already, so we try to change > the log file size with: > > mmchfs FSNAME -L 128M > > This says: > > mmchfs: Attention: You must restart the GPFS daemons before the new log > file > size takes effect. The GPFS daemons can be restarted one node at a time. > When the GPFS daemon is restarted on the last node in the cluster, the new > log size becomes effective. > > > We multi-cluster the file-system, so do we have to restart every node in > all clusters, or just in the storage cluster? > > And how do we tell once it has become active? > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 16:24:41 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 15:24:41 +0000 Subject: [gpfsug-discuss] Well, this is the pits... Message-ID: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu May 4 16:34:34 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 4 May 2017 17:34:34 +0200 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 16:43:56 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 15:43:56 +0000 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> Message-ID: <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser > wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 4 16:45:53 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 4 May 2017 15:45:53 +0000 Subject: [gpfsug-discuss] Replace SSL cert in GUI - need guidance Message-ID: Hi all, I'm going through the steps outlines in the following article: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_managecertforgui.htm Will this work for 4.2.1 installations? Only because in step 5, "Generate a Java(tm) keystore file (.jks) by using the keytool. It is stored in the following directory:", the given directory - /opt/ibm/wlp/java/jre/bin - does not exist. Only the path upto and including wlp is on my GUI server. I can't imagine the instructions being so different between 4.2.1 and 4.2 but I've seen it happen.. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu May 4 16:54:50 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 4 May 2017 17:54:50 +0200 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 4 16:55:36 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 4 May 2017 15:55:36 +0000 Subject: [gpfsug-discuss] Replace SSL cert in GUI - need guidance In-Reply-To: References: Message-ID: Never mind - /usr/lpp/mmfs/java/jre/bin is where it's at. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 04 May 2017 16:46 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Replace SSL cert in GUI - need guidance Hi all, I'm going through the steps outlines in the following article: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_managecertforgui.htm Will this work for 4.2.1 installations? Only because in step 5, "Generate a Java(tm) keystore file (.jks) by using the keytool. It is stored in the following directory:", the given directory - /opt/ibm/wlp/java/jre/bin - does not exist. Only the path upto and including wlp is on my GUI server. I can't imagine the instructions being so different between 4.2.1 and 4.2 but I've seen it happen.. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 17:07:32 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 16:07:32 +0000 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> Message-ID: <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser > wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser > wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Thu May 4 17:11:53 2017 From: salut4tions at gmail.com (Jordan Robertson) Date: Thu, 4 May 2017 12:11:53 -0400 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: Kevin, The math currently used in the code appears to be "greater than 31 NSD's in the filesystem" combined with "greater than 31 pit worker threads", explicitly for a balancing restripe (we actually hit that combo on an older version of 3.5.x before the safety got written in there...it was a long day). At least, that's the apparent math used through 4.1.1.10, which we're currently running. If pitWorkerThreadsPerNode is set to 0 (default), GPFS should set the active thread number equal to the number of cores in the node, to a max of 16 threads I believe. Take in mind that for a restripe, it will also include the threads available on the fs manager. So if your fs manager and at least one helper node are both set to "0", and each contains at least 16 cores, the restripe "thread calculation" will exceed 31 threads so it won't run. We've had to tune our helper nodes to lower numbers (e.g a single helper node to 15 threads). Aaron please correct me if I'm braining that wrong anywhere. -Jordan On Thu, May 4, 2017 at 12:07 PM, Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu> wrote: > Hi Olaf, > > I didn?t touch pitWorkerThreadsPerNode ? it was already zero. > > I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or > 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes > this? With what I?m doing I need the ability to run mmrestripefs. > > It seems to me that mmrestripefs could check whether QOS is enabled ? > granted, it would have no way of knowing whether the values used actually > are reasonable or not ? but if QOS is enabled then ?trust? it to not > overrun the system. > > PMR time? Thanks.. > > Kevin > > On May 4, 2017, at 10:54 AM, Olaf Weiser wrote: > > HI Kevin, > the number of NSDs is more or less nonsense .. it is just the number of > nodes x PITWorker should not exceed to much the #mutex/FS block > did you adjust/tune the PitWorker ? ... > > so far as I know.. that the code checks the number of NSDs is already > considered as a defect and will be fixed / is already fixed ( I stepped > into it here as well) > > ps. QOS is the better approach to address this, but unfortunately.. not > everyone is using it by default... that's why I suspect , the development > decide to put in a check/limit here .. which in your case(with QOS) > would'nt needed > > > > > > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:44 PM > Subject: Re: [gpfsug-discuss] Well, this is the pits... > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Olaf, > > Your explanation mostly makes sense, but... > > Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. > And this filesystem only has 32 disks, which I would imagine is not an > especially large number compared to what some people reading this e-mail > have in their filesystems. > > I thought that QOS (which I?m using) was what would keep an mmrestripefs > from overrunning the system ? QOS has worked extremely well for us - it?s > one of my favorite additions to GPFS. > > Kevin > > On May 4, 2017, at 10:34 AM, Olaf Weiser <*olaf.weiser at de.ibm.com* > > wrote: > > no.. it is just in the code, because we have to avoid to run out of mutexs > / block > > reduce the number of nodes -N down to 4 (2nodes is even more safer) ... > is the easiest way to solve it for now.... > > I've been told the real root cause will be fixed in one of the next ptfs > .. within this year .. > this warning messages itself should appear every time.. but unfortunately > someone coded, that it depends on the number of disks (NSDs).. that's why I > suspect you did'nt see it before > but the fact , that we have to make sure, not to overrun the system by > mmrestripe remains.. to please lower the -N number of nodes to 4 or better > 2 > > (even though we know.. than the mmrestripe will take longer) > > > From: "Buterbaugh, Kevin L" <*Kevin.Buterbaugh at Vanderbilt.Edu* > > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 05/04/2017 05:26 PM > Subject: [gpfsug-discuss] Well, this is the pits... > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > Hi All, > > Another one of those, ?I can open a PMR if I need to? type questions? > > We are in the process of combining two large GPFS filesystems into one new > filesystem (for various reasons I won?t get into here). Therefore, I?m > doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. > > Yesterday I did an ?mmrestripefs -r -N ? (after > suspending a disk, of course). Worked like it should. > > Today I did a ?mmrestripefs -b -P capacity -N servers>? and got: > > mmrestripefs: The total number of PIT worker threads of all participating > nodes has been exceeded to safely restripe the file system. The total > number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode > of the participating nodes, cannot exceed 31. Reissue the command with a > smaller set of participating nodes (-N option) and/or lower the > pitWorkerThreadsPerNode configure setting. By default the file system > manager node is counted as a participating node. > mmrestripefs: Command failed. Examine previous error messages to determine > cause. > > So there must be some difference in how the ?-r? and ?-b? options > calculate the number of PIT worker threads. I did an ?mmfsadm dump all | > grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem > manager node ? they all say the same thing: > > pitWorkerThreadsPerNode 0 > > Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > *Kevin.Buterbaugh at vanderbilt.edu* - > (615)875-9633 <(615)%20875-9633> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Thu May 4 17:49:20 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Thu, 4 May 2017 12:49:20 -0400 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu><982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: Hi, >>I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. GPFS version 4.2.3.0 (and above) fixes this issue and supports "sum of pitWorkerThreadsPerNode of the participating nodes (-N parameter to mmrestripefs)" to exceed 31. If you are using 4.2.2.3, then depending on "number of nodes participating in the mmrestripefs" then the GPFS config parameter "pitWorkerThreadsPerNode" need to be adjusted such that "sum of pitWorkerThreadsPerNode of the participating nodes <= 31". For example, if "number of nodes participating in the mmrestripefs" is 6 then adjust "mmchconfig pitWorkerThreadsPerNode=5 -N ". GPFS would need to be restarted for this parameter to take effect on the participating_nodes (verify with mmfsadm dump config | grep pitWorkerThreadsPerNode) Regards, -Kums From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 12:08 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 17:56:26 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 16:56:26 +0000 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: Hi Kums, Thanks for the info on the releases ? can you clarify about pitWorkerThreadsPerNode? As I said in my original post, on all 8 NSD servers and the filesystem manager it is set to zero. No matter how many times I add zero to zero I don?t get a value > 31! ;-) So I take it that zero has some sort of unspecified significance? Thanks? Kevin On May 4, 2017, at 11:49 AM, Kumaran Rajaram > wrote: Hi, >>I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. GPFS version 4.2.3.0 (and above) fixes this issue and supports "sum of pitWorkerThreadsPerNode of the participating nodes (-N parameter to mmrestripefs)" to exceed 31. If you are using 4.2.2.3, then depending on "number of nodes participating in the mmrestripefs" then the GPFS config parameter "pitWorkerThreadsPerNode" need to be adjusted such that "sum of pitWorkerThreadsPerNode of the participating nodes <= 31". For example, if "number of nodes participating in the mmrestripefs" is 6 then adjust "mmchconfig pitWorkerThreadsPerNode=5 -N ". GPFS would need to be restarted for this parameter to take effect on the participating_nodes (verify with mmfsadm dump config | grep pitWorkerThreadsPerNode) Regards, -Kums From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 12:08 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser > wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser > wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 18:15:16 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 17:15:16 +0000 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: <8E68031C-8362-468B-873F-2B3D3B2A15B7@vanderbilt.edu> Hi Stephen, My apologies - Jordan?s response had been snagged by the University's SPAM filter (I went and checked and found it after receiving your e-mail)? Kevin On May 4, 2017, at 12:04 PM, Stephen Ulmer > wrote: Look at Jordan?s answer, he explains what significance 0 has. In short, GPFS will use one thread per core per server, so they could add to 31 quickly. ;) -- Stephen On May 4, 2017, at 12:56 PM, Buterbaugh, Kevin L > wrote: Hi Kums, Thanks for the info on the releases ? can you clarify about pitWorkerThreadsPerNode? As I said in my original post, on all 8 NSD servers and the filesystem manager it is set to zero. No matter how many times I add zero to zero I don?t get a value > 31! ;-) So I take it that zero has some sort of unspecified significance? Thanks? Kevin On May 4, 2017, at 11:49 AM, Kumaran Rajaram > wrote: Hi, >>I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. GPFS version 4.2.3.0 (and above) fixes this issue and supports "sum of pitWorkerThreadsPerNode of the participating nodes (-N parameter to mmrestripefs)" to exceed 31. If you are using 4.2.2.3, then depending on "number of nodes participating in the mmrestripefs" then the GPFS config parameter "pitWorkerThreadsPerNode" need to be adjusted such that "sum of pitWorkerThreadsPerNode of the participating nodes <= 31". For example, if "number of nodes participating in the mmrestripefs" is 6 then adjust "mmchconfig pitWorkerThreadsPerNode=5 -N ". GPFS would need to be restarted for this parameter to take effect on the participating_nodes (verify with mmfsadm dump config | grep pitWorkerThreadsPerNode) Regards, -Kums From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 12:08 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser > wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser > wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Thu May 4 18:20:41 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Thu, 4 May 2017 13:20:41 -0400 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu><982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu><27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: >>Thanks for the info on the releases ? can you clarify about pitWorkerThreadsPerNode? pitWorkerThreadsPerNode -- Specifies how many threads do restripe, data movement, etc >>As I said in my original post, on all 8 NSD servers and the filesystem manager it is set to zero. No matter how many times I add zero to zero I don?t get a value > 31! ;-) So I take it that zero has some sort of unspecified significance? Thanks? Value of 0 just indicates pitWorkerThreadsPerNode takes internal_value based on GPFS setup and file-system configuration (which can be 16 or lower) based on the following formula. Default is pitWorkerThreadsPerNode = MIN(16, (numberOfDisks_in_filesystem * 4) / numberOfParticipatingNodes_in_mmrestripefs + 1) For example, if you have 64 x NSDs in your file-system and you are using 8 NSD servers in "mmrestripefs -N", then pitWorkerThreadsPerNode = MIN (16, (256/8)+1) resulting in pitWorkerThreadsPerNode to take value of 16 ( default 0 will result in 16 threads doing restripe per mmrestripefs participating Node). If you want 8 NSD servers (running 4.2.2.3) to participate in mmrestripefs operation then set "mmchconfig pitWorkerThreadsPerNode=3 -N <8_NSD_Servers>" such that (8 x 3) is less than 31. Regards, -Kums From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 12:57 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Kums, Thanks for the info on the releases ? can you clarify about pitWorkerThreadsPerNode? As I said in my original post, on all 8 NSD servers and the filesystem manager it is set to zero. No matter how many times I add zero to zero I don?t get a value > 31! ;-) So I take it that zero has some sort of unspecified significance? Thanks? Kevin On May 4, 2017, at 11:49 AM, Kumaran Rajaram wrote: Hi, >>I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. GPFS version 4.2.3.0 (and above) fixes this issue and supports "sum of pitWorkerThreadsPerNode of the participating nodes (-N parameter to mmrestripefs)" to exceed 31. If you are using 4.2.2.3, then depending on "number of nodes participating in the mmrestripefs" then the GPFS config parameter "pitWorkerThreadsPerNode" need to be adjusted such that "sum of pitWorkerThreadsPerNode of the participating nodes <= 31". For example, if "number of nodes participating in the mmrestripefs" is 6 then adjust "mmchconfig pitWorkerThreadsPerNode=5 -N ". GPFS would need to be restarted for this parameter to take effect on the participating_nodes (verify with mmfsadm dump config | grep pitWorkerThreadsPerNode) Regards, -Kums From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 12:08 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Thu May 4 23:22:12 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Thu, 4 May 2017 22:22:12 +0000 Subject: [gpfsug-discuss] bizarre performance behavior In-Reply-To: References: <2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov><4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be><67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov><9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be><7f7349c9-bdd3-5847-1cca-d98d221489fe@ugent.be> Message-ID: Hi, >>So the problem was some bad ib routing. We changed some ib links, and then we got also 12GB/s read with nsdperf. >>On our clients we then are able to achieve the 7,2GB/s in total we also saw using the nsd servers! This is good to hear. >> We are now running some tests with different blocksizes and parameters, because our backend storage is able to do more than the 7.2GB/s we get with GPFS (more like 14GB/s in total). I guess prefetchthreads and nsdworkerthreads are the ones to look at? If you are on 4.2.0.3 or higher, you can use workerThreads config paramter (start with value of 128, and increase in increments of 128 until MAX supported) and this setting will auto adjust values for other parameters such as prefetchThreads, worker3Threads etc. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Tuning%20Parameters In addition to trying larger file-system block-size (e.g. 4MiB or higher such that is aligns with storage volume RAID-stripe-width) and config parameters (e.g , workerThreads, ignorePrefetchLUNCount), it will be good to assess the "backend storage" performance for random I/O access pattern (with block I/O sizes in units of FS block-size) as this is more likely I/O scenario that the backend storage will experience when you have many GPFS nodes performing I/O simultaneously to the file-system (in production environment). mmcrfs has option "[-j {cluster | scatter}]". "-j scatter" would be recommended for consistent file-system performance over the lifetime of the file-system but then "-j scatter" will result in random I/O to backend storage (even though application is performing sequential I/O). For your test purposes, you may assess the GPFS file-system performance by mmcrfs with "-j cluster" and you may see good sequential results (compared to -j scatter) for lower client counts but as you scale the client counts the combined workload can result in "-j scatter" to backend storage (limiting the FS performance to random I/O performance of the backend storage). [snip from mmcrfs] layoutMap={scatter | cluster} Specifies the block allocation map type. When allocating blocks for a given file, GPFS first uses a round?robin algorithm to spread the data across all disks in the storage pool. After a disk is selected, the location of the data block on the disk is determined by the block allocation map type. If cluster is specified, GPFS attempts to allocate blocks in clusters. Blocks that belong to a particular file are kept adjacent to each other within each cluster. If scatter is specified, the location of the block is chosen randomly. The cluster allocation method may provide better disk performance for some disk subsystems in relatively small installations. The benefits of clustered block allocation diminish when the number of nodes in the cluster or the number of disks in a file system increases, or when the file system?s free space becomes fragmented. The cluster allocation method is the default for GPFS clusters with eight or fewer nodes and for file systems with eight or fewer disks. The scatter allocation method provides more consistent file system performance by averaging out performance variations due to block location (for many disk subsystems, the location of the data relative to the disk edge has a substantial effect on performance). This allocation method is appropriate in most cases and is the default for GPFS clusters with more than eight nodes or file systems with more than eight disks. The block allocation map type cannot be changed after the storage pool has been created. .. .. -j {cluster | scatter} Specifies the default block allocation map type to be used if layoutMap is not specified for a given storage pool. [/snip from mmcrfs] My two cents, -Kums From: Kenneth Waegeman To: gpfsug main discussion list Date: 05/04/2017 09:23 AM Subject: Re: [gpfsug-discuss] bizarre performance behavior Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We found out using ib_read_bw and ib_write_bw that there were some links between server and clients degraded, having a bandwith of 350MB/s strangely, nsdperf did not report the same. It reported 12GB/s write and 9GB/s read, which was much more then we actually could achieve. So the problem was some bad ib routing. We changed some ib links, and then we got also 12GB/s read with nsdperf. On our clients we then are able to achieve the 7,2GB/s in total we also saw using the nsd servers! Many thanks for the help !! We are now running some tests with different blocksizes and parameters, because our backend storage is able to do more than the 7.2GB/s we get with GPFS (more like 14GB/s in total). I guess prefetchthreads and nsdworkerthreads are the ones to look at? Cheers! Kenneth On 21/04/17 22:27, Kumaran Rajaram wrote: Hi Kenneth, As it was mentioned earlier, it will be good to first verify the raw network performance between the NSD client and NSD server using the nsdperf tool that is built with RDMA support. g++ -O2 -DRDMA -o nsdperf -lpthread -lrt -libverbs -lrdmacm nsdperf.C In addition, since you have 2 x NSD servers it will be good to perform NSD client file-system performance test with just single NSD server (mmshutdown the other server, assuming all the NSDs have primary, server NSD server configured + Quorum will be intact when a NSD server is brought down) to see if it helps to improve the read performance + if there are variations in the file-system read bandwidth results between NSD_server#1 'active' vs. NSD_server #2 'active' (with other NSD server in GPFS "down" state). If there is significant variation, it can help to isolate the issue to particular NSD server (HW or IB issue?). You can issue "mmdiag --waiters" on NSD client as well as NSD servers during your dd test, to verify if there are unsual long GPFS waiters. In addition, you may issue Linux "perf top -z" command on the GPFS node to see if there is high CPU usage by any particular call/event (for e.g., If GPFS config parameter verbsRdmaMaxSendBytes has been set to low value from the default 16M, then it can cause RDMA completion threads to go CPU bound ). Please verify some performance scenarios detailed in Chapter 22 in Spectrum Scale Problem Determination Guide (link below). https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/pdf/scale_pdg.pdf?view=kc Thanks, -Kums From: Kenneth Waegeman To: gpfsug main discussion list Date: 04/21/2017 11:43 AM Subject: Re: [gpfsug-discuss] bizarre performance behavior Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We already verified this on our nsds: [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --QpiSpeed QpiSpeed=maxdatarate [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --turbomode turbomode=enable [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg ?-SysProfile SysProfile=perfoptimized so sadly this is not the issue. Also the output of the verbs commands look ok, there are connections from the client to the nsds are there is data being read and writen. Thanks again! Kenneth On 21/04/17 16:01, Kumaran Rajaram wrote: Hi, Try enabling the following in the BIOS of the NSD servers (screen shots below) Turbo Mode - Enable QPI Link Frequency - Max Performance Operating Mode - Maximum Performance >>>>While we have even better performance with sequential reads on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server seems limited by 0,5GB/s) independent of the number of clients >>We are testing from 2 testing machines connected to the nsds with infiniband, verbs enabled. Also, It will be good to verify that all the GPFS nodes have Verbs RDMA started using "mmfsadm test verbs status" and that the NSD client-server communication from client to server during "dd" is actually using Verbs RDMA using "mmfsadm test verbs conn" command (on NSD client doing dd). If not, then GPFS might be using TCP/IP network over which the cluster is configured impacting performance (If this is the case, GPFS mmfs.log.latest for any Verbs RDMA related errors and resolve). Regards, -Kums From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" To: gpfsug main discussion list Date: 04/21/2017 09:11 AM Subject: Re: [gpfsug-discuss] bizarre performance behavior Sent by: gpfsug-discuss-bounces at spectrumscale.org Fantastic news! It might also be worth running "cpupower monitor" or "turbostat" on your NSD servers while you're running dd tests from the clients to see what CPU frequency your cores are actually running at. A typical NSD server workload (especially with IB verbs and for reads) can be pretty light on CPU which might not prompt your CPU crew governor to up the frequency (which can affect throughout). If your frequency scaling governor isn't kicking up the frequency of your CPUs I've seen that cause this behavior in my testing. -Aaron On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman wrote: Hi, We are running a test setup with 2 NSD Servers backed by 4 Dell Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of the 4 powervaults, nsd02 is primary serving LUNS of controller B. We are testing from 2 testing machines connected to the nsds with infiniband, verbs enabled. When we do dd from the NSD servers, we see indeed performance going to 5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is able to get the data at a decent speed. Since we can write from the clients at a good speed, I didn't suspect the communication between clients and nsds being the issue, especially since total performance stays the same using 1 or multiple clients. I'll use the nsdperf tool to see if we can find anything, thanks! K On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: Interesting. Could you share a little more about your architecture? Is it possible to mount the fs on an NSD server and do some dd's from the fs on the NSD server? If that gives you decent performance perhaps try NSDPERF next https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf -Aaron On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman wrote: Hi, Having an issue that looks the same as this one: We can do sequential writes to the filesystem at 7,8 GB/s total , which is the expected speed for our current storage backend. While we have even better performance with sequential reads on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server seems limited by 0,5GB/s) independent of the number of clients (1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in this thread, but nothing seems to impact this read performance. Any ideas? Thanks! Kenneth On 17/02/17 19:29, Jan-Frode Myklebust wrote: I just had a similar experience from a sandisk infiniflash system SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes. and 250-300 Mbyte/s on sequential reads!! Random reads were on the order of 2 Gbyte/s. After a bit head scratching snd fumbling around I found out that reducing maxMBpS from 10000 to 100 fixed the problem! Digging further I found that reducing prefetchThreads from default=72 to 32 also fixed it, while leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s. Could something like this be the problem on your box as well? -jf fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister : Well, I'm somewhat scrounging for hardware. This is in our test environment :) And yep, it's got the 2U gpu-tray in it although even without the riser it has 2 PCIe slots onboard (excluding the on-board dual-port mezz card) so I think it would make a fine NSD server even without the riser. -Aaron On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services) wrote: > Maybe its related to interrupt handlers somehow? You drive the load up on one socket, you push all the interrupt handling to the other socket where the fabric card is attached? > > Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers, I assume its some 2U gpu-tray riser one or something !) > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org[ gpfsug-discuss-bounces at spectrumscale.org] on behalf of Aaron Knister [ aaron.s.knister at nasa.gov] > Sent: 17 February 2017 15:52 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] bizarre performance behavior > > This is a good one. I've got an NSD server with 4x 16GB fibre > connections coming in and 1x FDR10 and 1x QDR connection going out to > the clients. I was having a really hard time getting anything resembling > sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for > reads). The back-end is a DDN SFA12K and I *know* it can do better than > that. > > I don't remember quite how I figured this out but simply by running > "openssl speed -multi 16" on the nsd server to drive up the load I saw > an almost 4x performance jump which is pretty much goes against every > sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to > quadruple your i/o performance"). > > This feels like some type of C-states frequency scaling shenanigans that > I haven't quite ironed down yet. I booted the box with the following > kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which > didn't seem to make much of a difference. I also tried setting the > frequency governer to userspace and setting the minimum frequency to > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have > to run something to drive up the CPU load and then performance improves. > > I'm wondering if this could be an issue with the C1E state? I'm curious > if anyone has seen anything like this. The node is a dx360 M4 > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 61023 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 85131 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 84819 bytes Desc: not available URL: From ckrafft at de.ibm.com Fri May 5 18:13:18 2017 From: ckrafft at de.ibm.com (Christoph Krafft) Date: Fri, 5 May 2017 19:13:18 +0200 Subject: [gpfsug-discuss] Node Failure with SCSI-3 Persistant Reserve Message-ID: Hello folks, has anyone made "posotive" experiences with SCSI-3 Pers. Reserve? Is this "method" still valid for Linux? Thank you for any hints and tips! Mit freundlichen Gr??en / Sincerely Christoph Krafft Client Technical Specialist - Power Systems, IBM Systems Certified IT Specialist @ The Open Group Phone: +49 (0) 7034 643 2171 IBM Deutschland GmbH Mobile: +49 (0) 160 97 81 86 12 Am Weiher 24 Email: ckrafft at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Nicole Reimer, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Stefan Lutz Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19235477.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From scale at us.ibm.com Fri May 5 20:18:12 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 5 May 2017 15:18:12 -0400 Subject: [gpfsug-discuss] Node Failure with SCSI-3 Persistant Reserve In-Reply-To: References: Message-ID: SCSI-3 persistent reserve is still supported as documented in the FAQ. I personally do not have any experience using it. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Christoph Krafft" To: "gpfsug main discussion list" Date: 05/05/2017 01:14 PM Subject: [gpfsug-discuss] Node Failure with SCSI-3 Persistant Reserve Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello folks, has anyone made "posotive" experiences with SCSI-3 Pers. Reserve? Is this "method" still valid for Linux? Thank you for any hints and tips! Mit freundlichen Gr??en / Sincerely Christoph Krafft Client Technical Specialist - Power Systems, IBM Systems Certified IT Specialist @ The Open Group Phone: +49 (0) 7034 643 2171 IBM Deutschland GmbH Mobile: +49 (0) 160 97 81 86 12 Am Weiher 24 Email: ckrafft at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Nicole Reimer, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Stefan Lutz Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Mon May 8 17:06:22 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 12:06:22 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable Message-ID: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> We have a setup in which "cluster 0" is made up of clients only on gpfs v3.5, ie, no NDS's or formal storage on this primary membership. All storage for those clients come in a multi-cluster fashion, from clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). We recently added a new storage cluster 4 (4.1.1-14), and for some obscure reason we keep getting "Network is unreachable" during mount by clients, even though there were no issues or errors with the multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' worked fine, and all clients have an entry in /etc/fstab for the file system associated with the new cluster 4. The weird thing is that we can mount cluster 3 fine (also 4.1). Another piece og information is that as far as GPFS goes all clusters are configured to communicate exclusively over Infiniband, each on a different 10.20.x.x network, but broadcast 10.20.255.255. As far as the IB network goes there are no problems routing/pinging around all the clusters. So this must be internal to GPFS. None of the clusters have the subnet parameter set explicitly at configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem we need to. All have cipherList AUTHONLY. One difference is that cluster 4 has DMAPI enabled (don't think it matters). Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients during mount (10.20.179.1 is one of the NDS on cluster 4): Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node 10.20.179.1 failed with error 447 (client side). Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount wosgpfs.wos-gateway01-ib0:wosgpfs Mon May 8 11:35:28.783 2017: Network is unreachable I see this reference to "TLS handshake" and error 447, however according to the manual this TLS is only set to be default on 4.2 onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. mmdiag --network for some of the client gives this excerpt (broken status): tapenode-ib0 10.20.83.5 broken 233 -1 0 0 Linux/L gpc-f114n014-ib0 10.20.114.14 broken 233 -1 0 0 Linux/L gpc-f114n015-ib0 10.20.114.15 broken 233 -1 0 0 Linux/L gpc-f114n016-ib0 10.20.114.16 broken 233 -1 0 0 Linux/L wos-gateway01-ib0 10.20.179.1 broken 233 -1 0 0 Linux/L I guess I just need a hint on how to troubleshoot this situation (the 4.1 troubleshoot guide is not helping). Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From S.J.Thompson at bham.ac.uk Mon May 8 17:12:35 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 8 May 2017 16:12:35 +0000 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: Do you have multiple networks on the hosts? We've seen this sort of thing when rp_filter is dropping traffic with asynchronous routing. I know you said it's set to only go over IB, but if you have names that resolve onto you Ethernet, and admin name etc are not correct, it might be your problem. If you had 4.2, I'd suggest mmnetverify. I suppose that might work if you copied it out of the 4.x packages anyway? Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of pinto at scinet.utoronto.ca [pinto at scinet.utoronto.ca] Sent: 08 May 2017 17:06 To: gpfsug main discussion list Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable We have a setup in which "cluster 0" is made up of clients only on gpfs v3.5, ie, no NDS's or formal storage on this primary membership. All storage for those clients come in a multi-cluster fashion, from clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). We recently added a new storage cluster 4 (4.1.1-14), and for some obscure reason we keep getting "Network is unreachable" during mount by clients, even though there were no issues or errors with the multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' worked fine, and all clients have an entry in /etc/fstab for the file system associated with the new cluster 4. The weird thing is that we can mount cluster 3 fine (also 4.1). Another piece og information is that as far as GPFS goes all clusters are configured to communicate exclusively over Infiniband, each on a different 10.20.x.x network, but broadcast 10.20.255.255. As far as the IB network goes there are no problems routing/pinging around all the clusters. So this must be internal to GPFS. None of the clusters have the subnet parameter set explicitly at configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem we need to. All have cipherList AUTHONLY. One difference is that cluster 4 has DMAPI enabled (don't think it matters). Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients during mount (10.20.179.1 is one of the NDS on cluster 4): Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node 10.20.179.1 failed with error 447 (client side). Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount wosgpfs.wos-gateway01-ib0:wosgpfs Mon May 8 11:35:28.783 2017: Network is unreachable I see this reference to "TLS handshake" and error 447, however according to the manual this TLS is only set to be default on 4.2 onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. mmdiag --network for some of the client gives this excerpt (broken status): tapenode-ib0 10.20.83.5 broken 233 -1 0 0 Linux/L gpc-f114n014-ib0 10.20.114.14 broken 233 -1 0 0 Linux/L gpc-f114n015-ib0 10.20.114.15 broken 233 -1 0 0 Linux/L gpc-f114n016-ib0 10.20.114.16 broken 233 -1 0 0 Linux/L wos-gateway01-ib0 10.20.179.1 broken 233 -1 0 0 Linux/L I guess I just need a hint on how to troubleshoot this situation (the 4.1 troubleshoot guide is not helping). Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pinto at scinet.utoronto.ca Mon May 8 17:23:01 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 12:23:01 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <20170508122301.25824jjpcvgd20dh@support.scinet.utoronto.ca> Quoting "Simon Thompson (IT Research Support)" : > Do you have multiple networks on the hosts? We've seen this sort of > thing when rp_filter is dropping traffic with asynchronous routing. > Yes Simon, All clients and servers have multiple interfaces on different networks, but we've been careful to always join nodes with the -ib0 resolution, always on IB. I can also query with 'mmlscluster' and all nodes involved are listed with the 10.20.x.x IP and -ib0 extension on their names. We don't have mmnetverify anywhere yet. Thanks Jaime > I know you said it's set to only go over IB, but if you have names > that resolve onto you Ethernet, and admin name etc are not correct, > it might be your problem. > > If you had 4.2, I'd suggest mmnetverify. I suppose that might work > if you copied it out of the 4.x packages anyway? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of > pinto at scinet.utoronto.ca [pinto at scinet.utoronto.ca] > Sent: 08 May 2017 17:06 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] help with multi-cluster setup: Network is > unreachable > > We have a setup in which "cluster 0" is made up of clients only on > gpfs v3.5, ie, no NDS's or formal storage on this primary membership. > > All storage for those clients come in a multi-cluster fashion, from > clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). > > We recently added a new storage cluster 4 (4.1.1-14), and for some > obscure reason we keep getting "Network is unreachable" during mount > by clients, even though there were no issues or errors with the > multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' > worked fine, and all clients have an entry in /etc/fstab for the file > system associated with the new cluster 4. The weird thing is that we > can mount cluster 3 fine (also 4.1). > > Another piece og information is that as far as GPFS goes all clusters > are configured to communicate exclusively over Infiniband, each on a > different 10.20.x.x network, but broadcast 10.20.255.255. As far as > the IB network goes there are no problems routing/pinging around all > the clusters. So this must be internal to GPFS. > > None of the clusters have the subnet parameter set explicitly at > configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem > we need to. All have cipherList AUTHONLY. One difference is that > cluster 4 has DMAPI enabled (don't think it matters). > > Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients > during mount (10.20.179.1 is one of the NDS on cluster 4): > Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster > wosgpfs.wos-gateway01-ib0 > Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node > 10.20.179.1 failed with error 447 (client side). > Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster > wosgpfs.wos-gateway01-ib0 > Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount > wosgpfs.wos-gateway01-ib0:wosgpfs > Mon May 8 11:35:28.783 2017: Network is unreachable > > > I see this reference to "TLS handshake" and error 447, however > according to the manual this TLS is only set to be default on 4.2 > onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. > > mmdiag --network for some of the client gives this excerpt (broken status): > tapenode-ib0 10.20.83.5 > broken 233 -1 0 0 Linux/L > gpc-f114n014-ib0 10.20.114.14 > broken 233 -1 0 0 Linux/L > gpc-f114n015-ib0 10.20.114.15 > broken 233 -1 0 0 Linux/L > gpc-f114n016-ib0 10.20.114.16 > broken 233 -1 0 0 Linux/L > wos-gateway01-ib0 10.20.179.1 > broken 233 -1 0 0 Linux/L > > > > I guess I just need a hint on how to troubleshoot this situation (the > 4.1 troubleshoot guide is not helping). > > Thanks > Jaime > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From eric.wonderley at vt.edu Mon May 8 17:34:44 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Mon, 8 May 2017 12:34:44 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: Hi Jamie: I think typically you want to keep the clients ahead of the server in version. I would advance the version of you client nodes. New clients can communicate with older versions of server nsds. Vice versa...no so much. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon May 8 17:49:52 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 8 May 2017 16:49:52 +0000 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <54F54813-F40D-4F55-B774-272813758EC8@vanderbilt.edu> Hi Eric, Jamie, Interesting comment as we do exactly the opposite! I always make sure that my servers are running a particular version before I upgrade any clients. Now we never mix and match major versions (i.e. 4.x and 3.x) for long ? those kinds of upgrades we do rapidly. But right now I?ve got clients running 4.2.0-3 talking just fine to 4.2.2.3 servers. To be clear, I?m not saying I?m right and Eric?s wrong at all - just an observation / data point. YMMV? Kevin On May 8, 2017, at 11:34 AM, J. Eric Wonderley > wrote: Hi Jamie: I think typically you want to keep the clients ahead of the server in version. I would advance the version of you client nodes. New clients can communicate with older versions of server nsds. Vice versa...no so much. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon May 8 18:04:22 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 13:04:22 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <20170508130422.11171a2pqcx35p1y@support.scinet.utoronto.ca> Sorry, I made a mistake on the original description: all our clients are already on 4.1.1-7. Jaime Quoting "J. Eric Wonderley" : > Hi Jamie: > > I think typically you want to keep the clients ahead of the server in > version. I would advance the version of you client nodes. > > New clients can communicate with older versions of server nsds. Vice > versa...no so much. > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From mweil at wustl.edu Mon May 8 18:07:03 2017 From: mweil at wustl.edu (Matt Weil) Date: Mon, 8 May 2017 12:07:03 -0500 Subject: [gpfsug-discuss] socketMaxListenConnections and net.core.somaxconn Message-ID: <39b63a8b-2ae7-f9a0-c1c4-319f84fa5354@wustl.edu> Hello all, what happens if we set socketMaxListenConnections to a larger number than we have clients? more memory used? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From pinto at scinet.utoronto.ca Mon May 8 18:12:38 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 13:12:38 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <54F54813-F40D-4F55-B774-272813758EC8@vanderbilt.edu> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> <54F54813-F40D-4F55-B774-272813758EC8@vanderbilt.edu> Message-ID: <20170508131238.632312ooano92cxy@support.scinet.utoronto.ca> I only ask that we look beyond the trivial. The existing multi-cluster setup with mixed versions of servers already work fine with 4000+ clients on 4.1. We still have 3 legacy servers on 3.5, we already have a server on 4.1 also serving fine. The brand new 4.1 server we added last week seems to be at odds for some reason, not that obvious. Thanks Jaime Quoting "Buterbaugh, Kevin L" : > Hi Eric, Jamie, > > Interesting comment as we do exactly the opposite! > > I always make sure that my servers are running a particular version > before I upgrade any clients. Now we never mix and match major > versions (i.e. 4.x and 3.x) for long ? those kinds of upgrades we do > rapidly. But right now I?ve got clients running 4.2.0-3 talking > just fine to 4.2.2.3 servers. > > To be clear, I?m not saying I?m right and Eric?s wrong at all - just > an observation / data point. YMMV? > > Kevin > > On May 8, 2017, at 11:34 AM, J. Eric Wonderley > > wrote: > > Hi Jamie: > > I think typically you want to keep the clients ahead of the server > in version. I would advance the version of you client nodes. > > New clients can communicate with older versions of server nsds. > Vice versa...no so much. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From valdis.kletnieks at vt.edu Mon May 8 20:48:19 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 08 May 2017 15:48:19 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <13767.1494272899@turing-police.cc.vt.edu> On Mon, 08 May 2017 12:06:22 -0400, "Jaime Pinto" said: > Another piece og information is that as far as GPFS goes all clusters > are configured to communicate exclusively over Infiniband, each on a > different 10.20.x.x network, but broadcast 10.20.255.255. As far as Have you verified that broadcast setting actually works, and packets aren't being discarded as martians? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Mon May 8 21:06:28 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 16:06:28 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <13767.1494272899@turing-police.cc.vt.edu> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> <13767.1494272899@turing-police.cc.vt.edu> Message-ID: <20170508160628.20766ng8x98ogjpg@support.scinet.utoronto.ca> Quoting valdis.kletnieks at vt.edu: > On Mon, 08 May 2017 12:06:22 -0400, "Jaime Pinto" said: > >> Another piece og information is that as far as GPFS goes all clusters >> are configured to communicate exclusively over Infiniband, each on a >> different 10.20.x.x network, but broadcast 10.20.255.255. As far as > > Have you verified that broadcast setting actually works, and packets > aren't being discarded as martians? > Yes, we have. They are fine. I'm seeing "failure to join the cluster" messages prior to the "network unreachable" in the mmfslog files, so I'm starting to suspect minor disparities between older releases of 3.5.x.x at one end and newer 4.1.x.x at the other. I'll dig a little more and report the findings. Thanks Jaime ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From UWEFALKE at de.ibm.com Tue May 9 08:16:23 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 9 May 2017 09:16:23 +0200 Subject: [gpfsug-discuss] help with multi-cluster setup: Network isunreachable In-Reply-To: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: Hi, Jaime, I'd suggest you trace a client while trying to connect and check what addresses it is going to talk to actually. It is a bit tedious, but you will be able to find this in the trace report file. You might also get an idea what's going wrong... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Jaime Pinto" To: "gpfsug main discussion list" Date: 05/08/2017 06:06 PM Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable Sent by: gpfsug-discuss-bounces at spectrumscale.org We have a setup in which "cluster 0" is made up of clients only on gpfs v3.5, ie, no NDS's or formal storage on this primary membership. All storage for those clients come in a multi-cluster fashion, from clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). We recently added a new storage cluster 4 (4.1.1-14), and for some obscure reason we keep getting "Network is unreachable" during mount by clients, even though there were no issues or errors with the multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' worked fine, and all clients have an entry in /etc/fstab for the file system associated with the new cluster 4. The weird thing is that we can mount cluster 3 fine (also 4.1). Another piece og information is that as far as GPFS goes all clusters are configured to communicate exclusively over Infiniband, each on a different 10.20.x.x network, but broadcast 10.20.255.255. As far as the IB network goes there are no problems routing/pinging around all the clusters. So this must be internal to GPFS. None of the clusters have the subnet parameter set explicitly at configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem we need to. All have cipherList AUTHONLY. One difference is that cluster 4 has DMAPI enabled (don't think it matters). Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients during mount (10.20.179.1 is one of the NDS on cluster 4): Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node 10.20.179.1 failed with error 447 (client side). Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount wosgpfs.wos-gateway01-ib0:wosgpfs Mon May 8 11:35:28.783 2017: Network is unreachable I see this reference to "TLS handshake" and error 447, however according to the manual this TLS is only set to be default on 4.2 onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. mmdiag --network for some of the client gives this excerpt (broken status): tapenode-ib0 10.20.83.5 broken 233 -1 0 0 Linux/L gpc-f114n014-ib0 10.20.114.14 broken 233 -1 0 0 Linux/L gpc-f114n015-ib0 10.20.114.15 broken 233 -1 0 0 Linux/L gpc-f114n016-ib0 10.20.114.16 broken 233 -1 0 0 Linux/L wos-gateway01-ib0 10.20.179.1 broken 233 -1 0 0 Linux/L I guess I just need a hint on how to troubleshoot this situation (the 4.1 troubleshoot guide is not helping). Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Mark.Bush at siriuscom.com Tue May 9 17:25:00 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Tue, 9 May 2017 16:25:00 +0000 Subject: [gpfsug-discuss] CES and Directory list populating very slowly Message-ID: <9A893CC0-1816-4EDE-A06E-8491DF5509D4@siriuscom.com> I have a customer who is struggling (they already have a PMR open and it?s being actively worked on now). I?m simply seeking understanding of potential places to look. They have an ESS with a few CES nodes in front. Clients connect via SMB to the CES nodes. One fileset has about 300k smallish files in it and when the client opens a windows browser it takes around 30mins to finish populating the files in this SMB share. Here?s where my confusion is. When a client connects to a CES node this is all the job of the CES and it?s protocol services to handle, so in this case CTDB/Samba. But the flow of this is where maybe I?m a little fuzzy. Obviously the CES nodes act as clients to the NSD (IO/nodes in ESS land) servers. So, the data really doesn?t exist on the protocol node but passes things off to the NSD server for regular IO processing. Does the CES node do some type of caching? I?ve heard talk of LROC on CES nodes potentially but I?m curious if all of this is already being stored in the pagepool? What could cause a mostly metadata related simple directory lookup take what seems to the customer a long time for a couple hundred thousand files? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Tue May 9 18:00:22 2017 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 9 May 2017 10:00:22 -0700 Subject: [gpfsug-discuss] CES and Directory list populating very slowly In-Reply-To: <9A893CC0-1816-4EDE-A06E-8491DF5509D4@siriuscom.com> References: <9A893CC0-1816-4EDE-A06E-8491DF5509D4@siriuscom.com> Message-ID: ESS nodes have cache, but what matters most for this type of workloads is to have a very large metadata cache, this resides on the CES node for SMB/NFS workloads. so if you know that your client will use this 300k directory a lot you want to have a very large maxfilestocache setting on this nodes. alternative solution is to install a LROC device and configure a larger statcache, this helps especially if you have multiple larger directories and want to cache as many as possible from all of them. make sure you have enough tokenmanager and memory on them if you have multiple CES nodes and they all will have high settings. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Mark Bush To: gpfsug main discussion list Date: 05/09/2017 05:25 PM Subject: [gpfsug-discuss] CES and Directory list populating very slowly Sent by: gpfsug-discuss-bounces at spectrumscale.org I have a customer who is struggling (they already have a PMR open and it?s being actively worked on now). I?m simply seeking understanding of potential places to look. They have an ESS with a few CES nodes in front. Clients connect via SMB to the CES nodes. One fileset has about 300k smallish files in it and when the client opens a windows browser it takes around 30mins to finish populating the files in this SMB share. Here?s where my confusion is. When a client connects to a CES node this is all the job of the CES and it?s protocol services to handle, so in this case CTDB/Samba. But the flow of this is where maybe I?m a little fuzzy. Obviously the CES nodes act as clients to the NSD (IO/nodes in ESS land) servers. So, the data really doesn?t exist on the protocol node but passes things off to the NSD server for regular IO processing. Does the CES node do some type of caching? I?ve heard talk of LROC on CES nodes potentially but I?m curious if all of this is already being stored in the pagepool? What could cause a mostly metadata related simple directory lookup take what seems to the customer a long time for a couple hundred thousand files? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Tue May 9 19:58:22 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 9 May 2017 14:58:22 -0400 Subject: [gpfsug-discuss] CES and Directory list populating very slowly In-Reply-To: References: <9A893CC0-1816-4EDE-A06E-8491DF5509D4@siriuscom.com> Message-ID: If you haven't already, measure the time directly on the CES node command line skipping Windows and Samba overheads: time ls -l /path or time ls -lR /path Depending which you're interested in. From: "Sven Oehme" To: gpfsug main discussion list Date: 05/09/2017 01:01 PM Subject: Re: [gpfsug-discuss] CES and Directory list populating very slowly Sent by: gpfsug-discuss-bounces at spectrumscale.org ESS nodes have cache, but what matters most for this type of workloads is to have a very large metadata cache, this resides on the CES node for SMB/NFS workloads. so if you know that your client will use this 300k directory a lot you want to have a very large maxfilestocache setting on this nodes. alternative solution is to install a LROC device and configure a larger statcache, this helps especially if you have multiple larger directories and want to cache as many as possible from all of them. make sure you have enough tokenmanager and memory on them if you have multiple CES nodes and they all will have high settings. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Mark Bush ---05/09/2017 05:25:39 PM---I have a customer who is struggling (they already have a PMR open and it?s being actively worked on From: Mark Bush To: gpfsug main discussion list Date: 05/09/2017 05:25 PM Subject: [gpfsug-discuss] CES and Directory list populating very slowly Sent by: gpfsug-discuss-bounces at spectrumscale.org I have a customer who is struggling (they already have a PMR open and it?s being actively worked on now). I?m simply seeking understanding of potential places to look. They have an ESS with a few CES nodes in front. Clients connect via SMB to the CES nodes. One fileset has about 300k smallish files in it and when the client opens a windows browser it takes around 30mins to finish populating the files in this SMB share. Here?s where my confusion is. When a client connects to a CES node this is all the job of the CES and it?s protocol services to handle, so in this case CTDB/Samba. But the flow of this is where maybe I?m a little fuzzy. Obviously the CES nodes act as clients to the NSD (IO/nodes in ESS land) servers. So, the data really doesn?t exist on the protocol node but passes things off to the NSD server for regular IO processing. Does the CES node do some type of caching? I?ve heard talk of LROC on CES nodes potentially but I?m curious if all of this is already being stored in the pagepool? What could cause a mostly metadata related simple directory lookup take what seems to the customer a long time for a couple hundred thousand files? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 105 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed May 10 02:26:19 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 09 May 2017 21:26:19 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network isunreachable In-Reply-To: References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <20170509212619.88345qjpf9ea46kb@support.scinet.utoronto.ca> As it turned out, the 'authorized_keys' file placed in the /var/mmfs/ssl directory of the NDS for the new storage cluster 4 (4.1.1-14) needed an explicit entry of the following format for the bracket associated with clients on cluster 0: nistCompliance=off Apparently the default for 4.1.x is: nistCompliance=SP800-131A I just noticed that on cluster 3 (4.1.1-7) that entry is also present for the bracket associated with clients cluster 0. I guess the Seagate fellows that helped us install the G200 in our facility had that figured out. The original "TLS handshake" error message kind of gave me a hint of the problem, however the 4.1 installation manual specifically mentioned that this could be an issue only on 4.2 onward. The troubleshoot guide for 4.2 has this excerpt: "Ensure that the configurations of GPFS and the remote key management (RKM) server are compatible when it comes to the version of the TLS protocol used upon key retrieval (GPFS uses the nistCompliance configuration variable to control that). In particular, if nistCompliance=SP800-131A is set in GPFS, ensure that the TLS v1.2 protocol is enabled in the RKM server. If this does not resolve the issue, contact the IBM Support Center.". So, how am I to know that nistCompliance=off is even an option? For backward compatibility with the older storage clusters on 3.5 the clients cluster need to have nistCompliance=off I hope this helps the fellows in mixed versions environments, since it's not obvious from the 3.5/4.1 installation manuals or the troubleshoots guide what we should do. Thanks everyone for the help. Jaime Quoting "Uwe Falke" : > Hi, Jaime, > I'd suggest you trace a client while trying to connect and check what > addresses it is going to talk to actually. It is a bit tedious, but you > will be able to find this in the trace report file. You might also get an > idea what's going wrong... > > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Andreas Hasse, Thomas Wolter > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > Date: 05/08/2017 06:06 PM > Subject: [gpfsug-discuss] help with multi-cluster setup: Network is > unreachable > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > We have a setup in which "cluster 0" is made up of clients only on > gpfs v4.1, ie, no NDS's or formal storage on this primary membership. > > All storage for those clients come in a multi-cluster fashion, from > clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). > > We recently added a new storage cluster 4 (4.1.1-14), and for some > obscure reason we keep getting "Network is unreachable" during mount > by clients, even though there were no issues or errors with the > multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' > worked fine, and all clients have an entry in /etc/fstab for the file > system associated with the new cluster 4. The weird thing is that we > can mount cluster 3 fine (also 4.1). > > Another piece og information is that as far as GPFS goes all clusters > are configured to communicate exclusively over Infiniband, each on a > different 10.20.x.x network, but broadcast 10.20.255.255. As far as > the IB network goes there are no problems routing/pinging around all > the clusters. So this must be internal to GPFS. > > None of the clusters have the subnet parameter set explicitly at > configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem > we need to. All have cipherList AUTHONLY. One difference is that > cluster 4 has DMAPI enabled (don't think it matters). > > Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients > during mount (10.20.179.1 is one of the NDS on cluster 4): > Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster > wosgpfs.wos-gateway01-ib0 > Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node > 10.20.179.1 failed with error 447 (client side). > Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster > wosgpfs.wos-gateway01-ib0 > Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount > wosgpfs.wos-gateway01-ib0:wosgpfs > Mon May 8 11:35:28.783 2017: Network is unreachable > > > I see this reference to "TLS handshake" and error 447, however > according to the manual this TLS is only set to be default on 4.2 > onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. > > mmdiag --network for some of the client gives this excerpt (broken > status): > tapenode-ib0 10.20.83.5 > broken 233 -1 0 0 Linux/L > gpc-f114n014-ib0 10.20.114.14 > broken 233 -1 0 0 Linux/L > gpc-f114n015-ib0 10.20.114.15 > broken 233 -1 0 0 Linux/L > gpc-f114n016-ib0 10.20.114.16 > broken 233 -1 0 0 Linux/L > wos-gateway01-ib0 10.20.179.1 > broken 233 -1 0 0 Linux/L > > > > I guess I just need a hint on how to troubleshoot this situation (the > 4.1 troubleshoot guide is not helping). > > Thanks > Jaime > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Robert.Oesterlin at nuance.com Wed May 10 15:13:56 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 10 May 2017 14:13:56 +0000 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error Message-ID: <1AA3F1C6-8677-460F-B61D-FE87DDC5D8AB@nuance.com> I could not find any way to find out what the issue is here - ideas? [root]# mmhealth cluster show nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. I?ve tried it multiple times, always returns this error. I recently switched the cluster over to 4.2.2 Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed May 10 16:46:21 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 10 May 2017 11:46:21 -0400 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error In-Reply-To: <1AA3F1C6-8677-460F-B61D-FE87DDC5D8AB@nuance.com> References: <1AA3F1C6-8677-460F-B61D-FE87DDC5D8AB@nuance.com> Message-ID: <3939.1494431181@turing-police.cc.vt.edu> On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said: > [root]# mmhealth cluster show > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. Does 'mmlsmgr' return something sane? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed May 10 16:52:35 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 10 May 2017 15:52:35 +0000 Subject: [gpfsug-discuss] patched rsync question Message-ID: <27CCB813-DF05-49A6-A510-51499DFF4B85@vanderbilt.edu> Hi All, We are using the patched version of rsync: rsync version 3.0.9 protocol version 30 Copyright (C) 1996-2011 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, gpfs, iconv, symtimes rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details. to copy files from our old GPFS filesystem to our new GPFS filesystem. Unfortunately, for one group I inadvertently left off the ?-A? option when rsync?ing them, so it didn?t preserve their ACL?s. The original files were deleted, but we were able to restore them from a backup taken on April 25th. I looked, but cannot find any option to rsync that would only update based on ACL?s / permissions. Out of 13,000+ files, it appears that 910 have been modified in the interim. So what I am thinking of doing is rerunning the rsync from the restore directory to the new filesystem directory with the -A option. I?ll test this with ??dry-run? first, of course. I am thinking that this will update the ACL?s on all but the 910 modified files, which would then have to be dealt with on a case by case basis. Anyone have any comments on this idea or any better ideas? Thanks! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed May 10 17:20:39 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 10 May 2017 16:20:39 +0000 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error Message-ID: <4CE7EC70-E144-4B93-9747-49C422787785@nuance.com> Yea, it?s fine. I did manage to get it to respond after I did a ?mmsysmoncontrol restart? but it?s still not showing proper status across the cluster. Seems a bit fragile :-) Bob Oesterlin Sr Principal Storage Engineer, Nuance On 5/10/17, 10:46 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu" wrote: On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said: > [root]# mmhealth cluster show > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. Does 'mmlsmgr' return something sane? From Kevin.Buterbaugh at Vanderbilt.Edu Wed May 10 18:57:11 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 10 May 2017 17:57:11 +0000 Subject: [gpfsug-discuss] patched rsync question In-Reply-To: References: <27CCB813-DF05-49A6-A510-51499DFF4B85@vanderbilt.edu> Message-ID: Hi Stephen, Thanks for the suggestion. We thought about doing something similar to this but in the end I just ran a: rsync -aAvu /old/location /new/location And that seems to have updated the ACL?s on everything except the 910 modified files, which we?re dealing with in a manner similar to what you suggest below. Thanks all? Kevin On May 10, 2017, at 12:51 PM, Stephen Ulmer > wrote: If there?s only 13K files, and you don?t want to copy them, why use rsync at all? I think your solution is: * check every restored for for an ACL * copy the ACL to the same file in the new file system What about generating a file list and then just traversing it dumping the ACL from the restored file and adding it to the new file (after transforming the path). You could probably do the dump/assign with a pipe and not even write the ACLs down. You can even multi-thread the process if you have GNU xargs. Something like (untested): xargs -P num_cores_or_something ./helper_script.sh < list_of_files Where helper_script.sh is (also untested): NEWPATH=$( echo $1 | sed -e ?s/remove/replace/' ) getfacl $1 | setfacl $NEWPATH -- Stephen On May 10, 2017, at 11:52 AM, Buterbaugh, Kevin L > wrote: Hi All, We are using the patched version of rsync: rsync version 3.0.9 protocol version 30 Copyright (C) 1996-2011 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, gpfs, iconv, symtimes rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details. to copy files from our old GPFS filesystem to our new GPFS filesystem. Unfortunately, for one group I inadvertently left off the ?-A? option when rsync?ing them, so it didn?t preserve their ACL?s. The original files were deleted, but we were able to restore them from a backup taken on April 25th. I looked, but cannot find any option to rsync that would only update based on ACL?s / permissions. Out of 13,000+ files, it appears that 910 have been modified in the interim. So what I am thinking of doing is rerunning the rsync from the restore directory to the new filesystem directory with the -A option. I?ll test this with ??dry-run? first, of course. I am thinking that this will update the ACL?s on all but the 910 modified files, which would then have to be dealt with on a case by case basis. Anyone have any comments on this idea or any better ideas? Thanks! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From usa-principal at gpfsug.org Wed May 10 21:01:05 2017 From: usa-principal at gpfsug.org (usa-principal-gpfsug.org) Date: Wed, 10 May 2017 13:01:05 -0700 Subject: [gpfsug-discuss] Presentations Uploaded - SSUG Event @NERSC April 4-5 Message-ID: <7501c112d2e6ff79f9c89907a292ddab@webmail.gpfsug.org> All, I have just updated the Presentations page with 19 talks from the US SSUG event last month. The videos should be available on YouTube soon. I'll announce that separately. https://www.spectrumscale.org/presentations/ Cheers, Kristy From Anna.Wagner at de.ibm.com Thu May 11 12:28:22 2017 From: Anna.Wagner at de.ibm.com (Anna Christina Wagner) Date: Thu, 11 May 2017 13:28:22 +0200 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error In-Reply-To: <4CE7EC70-E144-4B93-9747-49C422787785@nuance.com> References: <4CE7EC70-E144-4B93-9747-49C422787785@nuance.com> Message-ID: Hello Bob, 4.2.2 is the release were we introduced "mmhealth cluster show". And you are totally right, it can be a little fragile at times. So a short explanation: We had this situation on test machines as well. Because of issues with the system not only the mm-commands but also usual Linux commands took more than 10 seconds to return. We have internally a default time out of 10 seconds for cli commands. So if you had a failover situation, in which the cluster manager was changed (we have our cluster state manager (CSM) on the cluster manager) and the mmlsmgr command did not return in 10 seconds the node does not know, that it is the CSM and will not start the corresponding service for that. If you want me to look further into it or if you have feedback regarding mmhealth please feel free to send me an email (Anna.Wagner at de.ibm.com) Mit freundlichen Gr??en / Kind regards Wagner, Anna Christina Software Engineer, Spectrum Scale Development IBM Systems IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 10.05.2017 18:21 Subject: Re: [gpfsug-discuss] "mmhealth cluster show" returns error Sent by: gpfsug-discuss-bounces at spectrumscale.org Yea, it?s fine. I did manage to get it to respond after I did a ?mmsysmoncontrol restart? but it?s still not showing proper status across the cluster. Seems a bit fragile :-) Bob Oesterlin Sr Principal Storage Engineer, Nuance On 5/10/17, 10:46 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu" wrote: On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said: > [root]# mmhealth cluster show > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. Does 'mmlsmgr' return something sane? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu May 11 13:05:14 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Thu, 11 May 2017 08:05:14 -0400 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error In-Reply-To: References: <4CE7EC70-E144-4B93-9747-49C422787785@nuance.com> Message-ID: I?ve also been exploring the mmhealth and gpfsgui for the first time this week. I have a test cluster where I?m trying the new stuff. Running 4.2.2-2 mmhealth cluster show says everyone is in nominal status: Component Total Failed Degraded Healthy Other ------------------------------------------------------------------------------------- NODE 12 0 0 12 0 GPFS 12 0 0 12 0 NETWORK 12 0 0 12 0 FILESYSTEM 0 0 0 0 0 DISK 0 0 0 0 0 GUI 1 0 0 1 0 PERFMON 12 0 0 12 0 However on the GUI there is conflicting information: 1) Home page shows 3/8 NSD Servers unhealthy 2) Home page shows 3/21 Nodes unhealthy ? where is it getting this notion? ? there are only 12 nodes in the whole cluster! 3) clicking on either NSD Servers or Nodes leads to the monitoring page where the top half spins forever, bottom half is content-free. I may have installed the pmsensors RPM on a couple of other nodes back in early April, but have forgotten which ones. They are in the production cluster. Also, the storage in this sandbox cluster has not been turned into a filesystem yet. There are a few dozen free NSDs. Perhaps the ?FILESYSTEM CHECKING? status is somehow wedging up the GUI? Node name: storage005.oscar.ccv.brown.edu Node status: HEALTHY Status Change: 15 hours ago Component Status Status Change Reasons ------------------------------------------------------ GPFS HEALTHY 16 hours ago - NETWORK HEALTHY 16 hours ago - FILESYSTEM CHECKING 16 hours ago - GUI HEALTHY 15 hours ago - PERFMON HEALTHY 16 hours ago I?ve tried restarting the GUI service and also rebooted the GUI server, but it comes back looking the same. Any thoughts? > On May 11, 2017, at 7:28 AM, Anna Christina Wagner wrote: > > Hello Bob, > > 4.2.2 is the release were we introduced "mmhealth cluster show". And you are totally right, it can be a little fragile at times. > > So a short explanation: > We had this situation on test machines as well. Because of issues with the system not only the mm-commands but also usual Linux commands > took more than 10 seconds to return. We have internally a default time out of 10 seconds for cli commands. So if you had a failover situation, in which the cluster manager > was changed (we have our cluster state manager (CSM) on the cluster manager) and the mmlsmgr command did not return in 10 seconds the node does not > know, that it is the CSM and will not start the corresponding service for that. > > > If you want me to look further into it or if you have feedback regarding mmhealth please feel free to send me an email (Anna.Wagner at de.ibm.com) > > Mit freundlichen Gr??en / Kind regards > > Wagner, Anna Christina > > Software Engineer, Spectrum Scale Development > IBM Systems > > IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz > Gesch?ftsf?hrung: Dirk Wittkopp > Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 > > > > From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 10.05.2017 18:21 > Subject: Re: [gpfsug-discuss] "mmhealth cluster show" returns error > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Yea, it?s fine. > > I did manage to get it to respond after I did a ?mmsysmoncontrol restart? but it?s still not showing proper status across the cluster. > > Seems a bit fragile :-) > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > On 5/10/17, 10:46 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu" wrote: > > On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said: > > > [root]# mmhealth cluster show > > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. > > Does 'mmlsmgr' return something sane? > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu May 11 13:36:47 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 11 May 2017 12:36:47 +0000 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error Message-ID: <9C601DFD-16FF-40E7-8D46-16033C443428@nuance.com> Thanks Anna, I will email you directly. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 From: on behalf of Anna Christina Wagner Reply-To: gpfsug main discussion list Date: Thursday, May 11, 2017 at 6:28 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] "mmhealth cluster show" returns error Hello Bob, 4.2.2 is the release were we introduced "mmhealth cluster show". And you are totally right, it can be a little fragile at times. So a short explanation: We had this situation on test machines as well. Because of issues with the system not only the mm-commands but also usual Linux commands took more than 10 seconds to return. We have internally a default time out of 10 seconds for cli commands. So if you had a failover situation, in which the cluster manager was changed (we have our cluster state manager (CSM) on the cluster manager) and the mmlsmgr command did not return in 10 seconds the node does not know, that it is the CSM and will not start the corresponding service for that. If you want me to look further into it or if you have feedback regarding mmhealth please feel free to send me an email (Anna.Wagner at de.ibm.com) Mit freundlichen Gr??en / Kind regards Wagner, Anna Christina Software Engineer, Spectrum Scale Development IBM Systems IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Christian.Fey at sva.de Thu May 11 16:37:43 2017 From: Christian.Fey at sva.de (Fey, Christian) Date: Thu, 11 May 2017 15:37:43 +0000 Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Message-ID: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The old ctdb cluster uses rid as idmap backend. Since CES officially supports only autorid, I tried to choose the right values for the idmap ranges / sizes to get the same IDs but was not successful with this. The old samba has a range assigned for their Active directory (idmap config XYZ : range = 1000000-1999999) My idea was to set autorid to the following during mmuserauth create: idmap config * : backend = autorid idmap config * : range = 200000-2999999 idmap config * : rangesize = 800000 With that it should use the first range for the builtin range and the second should then start with 1000000 like in the old rid config. Sadly, the range of the domain is the third one: /usr/lpp/mmfs/bin/net idmap get ranges RANGE 0: ALLOC RANGE 1: S-1-5-32 RANGE 2: S-1-5-21-123456789-123456789-123456789 Does anyone have an idea how to fix this, maybe in a supported way and without storing the IDs in the domain? Further on, does anyone use rid as backend, even if not officially supported? Maybe we could file a RPQ or sth. Like this. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5467 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Thu May 11 18:49:02 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 11 May 2017 17:49:02 +0000 Subject: [gpfsug-discuss] Edge case failure mode Message-ID: Just following up on some discussions we had at the UG this week. I mentioned a few weeks back that we were having issues with failover of NFS, and we figured a work around to our clients for this so that failover works great now (plus there is some code fixes coming down the line as well to help). Here's my story of fun with protocol nodes ... Since then we've occasionally been seeing the load average of 1 CES node rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients. A lot of digging and we found that CTDB was reporting > 80% memory used, so we tweaked the page pool down to solve this. Great we thought ... But alas that wasn't the cause. Just to be clear 95% of the time, the CES node is fine, I can do and ls in the mounted file-systems and all is good. When the load rises to 400, an ls takes 20-30 seconds, so they are related, but what is the initial cause? Other CES nodes are 100% fine and if we do mmces node suspend, and then resume all is well on the node (and no other CES node assumes the problem as the IP moves). Its not always the same CES IP, node or even data centre, and most of the time is looks fine. I logged a ticket with OCF today, and one thing they suggested was to disable NFSv3 as they've seen similar behaviour at another site. As far as I know, all my NFS clients are v4, but sure we disable v3 anyway as its not actually needed. (Both at the ganesha layer, change the default for exports and reconfigure all existing exports to v4 only for good measure). That didn't help, but certainly worth a try! Note that my CES cluster is multi-cluster mounting the file-systems and from the POSIX side, its fine most of the time. We've used the mmnetverify command to check that all is well as well. Of course this only checks the local cluster, not remote nodes, but as we aren't seeing expels and can access the FS, we assume that the GPFS layer is working fine. So we finally log a PMR with IBM, I catch a node in a broken state and pull a trace from it and upload that, and ask what other traces they might want (apparently there is no protocol trace for NFS in 4.2.2-3). Now, when we run this, I note that its doing things like mmlsfileset to the remote storage, coming from two clusters and some of this is timing out. We've already had issues with rp_filter on remote nodes causing expels, but the storage backend here has only 1 nic, and we can mount and access it all fine. So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not GPFS ping of course), but not make "admin" calls to it. Ssh appears to work fine as well BTW to it. So I check on my CES and this is multi-homed and rp_filter is enabled. Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm sure I'm an edge case, but it would be REALLY REALLY helpful to get mmnetverify to work across a cluster (e.g. I say this is a remote node and here's its FQDN, can you talk to it) which would have helped with diagnosis here. I'm not entirely sure why ssh etc would work and pass rp_filter, but not GPFS traffic (in some cases apparently), but I guess its something to do with how GPFS is binding and then the kernel routing layer. I'm still not sure if this is my root cause as the occurrences of the high load are a bit random (anything from every hour to being stable for 2-3 days), but since making the rp_filter change this afternoon, so far ...? I've created an RFE for mmnetverify to be able to test across a cluster... https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503 0 Simon From bbanister at jumptrading.com Thu May 11 18:58:18 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 11 May 2017 17:58:18 +0000 Subject: [gpfsug-discuss] Edge case failure mode In-Reply-To: References: Message-ID: <87b204b6e245439bb475792cf3672aa5@jumptrading.com> Hey Simon, I clicked your link but I think it went to a page that is not about this RFE: [cid:image001.png at 01D2CA56.41F66270] Cheers, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, May 11, 2017 12:49 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Edge case failure mode Just following up on some discussions we had at the UG this week. I mentioned a few weeks back that we were having issues with failover of NFS, and we figured a work around to our clients for this so that failover works great now (plus there is some code fixes coming down the line as well to help). Here's my story of fun with protocol nodes ... Since then we've occasionally been seeing the load average of 1 CES node rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients. A lot of digging and we found that CTDB was reporting > 80% memory used, so we tweaked the page pool down to solve this. Great we thought ... But alas that wasn't the cause. Just to be clear 95% of the time, the CES node is fine, I can do and ls in the mounted file-systems and all is good. When the load rises to 400, an ls takes 20-30 seconds, so they are related, but what is the initial cause? Other CES nodes are 100% fine and if we do mmces node suspend, and then resume all is well on the node (and no other CES node assumes the problem as the IP moves). Its not always the same CES IP, node or even data centre, and most of the time is looks fine. I logged a ticket with OCF today, and one thing they suggested was to disable NFSv3 as they've seen similar behaviour at another site. As far as I know, all my NFS clients are v4, but sure we disable v3 anyway as its not actually needed. (Both at the ganesha layer, change the default for exports and reconfigure all existing exports to v4 only for good measure). That didn't help, but certainly worth a try! Note that my CES cluster is multi-cluster mounting the file-systems and from the POSIX side, its fine most of the time. We've used the mmnetverify command to check that all is well as well. Of course this only checks the local cluster, not remote nodes, but as we aren't seeing expels and can access the FS, we assume that the GPFS layer is working fine. So we finally log a PMR with IBM, I catch a node in a broken state and pull a trace from it and upload that, and ask what other traces they might want (apparently there is no protocol trace for NFS in 4.2.2-3). Now, when we run this, I note that its doing things like mmlsfileset to the remote storage, coming from two clusters and some of this is timing out. We've already had issues with rp_filter on remote nodes causing expels, but the storage backend here has only 1 nic, and we can mount and access it all fine. So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not GPFS ping of course), but not make "admin" calls to it. Ssh appears to work fine as well BTW to it. So I check on my CES and this is multi-homed and rp_filter is enabled. Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm sure I'm an edge case, but it would be REALLY REALLY helpful to get mmnetverify to work across a cluster (e.g. I say this is a remote node and here's its FQDN, can you talk to it) which would have helped with diagnosis here. I'm not entirely sure why ssh etc would work and pass rp_filter, but not GPFS traffic (in some cases apparently), but I guess its something to do with how GPFS is binding and then the kernel routing layer. I'm still not sure if this is my root cause as the occurrences of the high load are a bit random (anything from every hour to being stable for 2-3 days), but since making the rp_filter change this afternoon, so far ...? I've created an RFE for mmnetverify to be able to test across a cluster... https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503 0 Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 36506 bytes Desc: image001.png URL: From S.J.Thompson at bham.ac.uk Thu May 11 19:05:08 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 11 May 2017 18:05:08 +0000 Subject: [gpfsug-discuss] Edge case failure mode Message-ID: Cheers Bryan ... http://goo.gl/YXitIF Points to: (Outlook/mailing list is line breaking and cutting the trailing 0) https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=105030 Simon From: > on behalf of "bbanister at jumptrading.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 11 May 2017 at 18:58 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] Edge case failure mode Hey Simon, I clicked your link but I think it went to a page that is not about this RFE: [cid:image001.png at 01D2CA56.41F66270] Cheers, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, May 11, 2017 12:49 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Edge case failure mode Just following up on some discussions we had at the UG this week. I mentioned a few weeks back that we were having issues with failover of NFS, and we figured a work around to our clients for this so that failover works great now (plus there is some code fixes coming down the line as well to help). Here's my story of fun with protocol nodes ... Since then we've occasionally been seeing the load average of 1 CES node rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients. A lot of digging and we found that CTDB was reporting > 80% memory used, so we tweaked the page pool down to solve this. Great we thought ... But alas that wasn't the cause. Just to be clear 95% of the time, the CES node is fine, I can do and ls in the mounted file-systems and all is good. When the load rises to 400, an ls takes 20-30 seconds, so they are related, but what is the initial cause? Other CES nodes are 100% fine and if we do mmces node suspend, and then resume all is well on the node (and no other CES node assumes the problem as the IP moves). Its not always the same CES IP, node or even data centre, and most of the time is looks fine. I logged a ticket with OCF today, and one thing they suggested was to disable NFSv3 as they've seen similar behaviour at another site. As far as I know, all my NFS clients are v4, but sure we disable v3 anyway as its not actually needed. (Both at the ganesha layer, change the default for exports and reconfigure all existing exports to v4 only for good measure). That didn't help, but certainly worth a try! Note that my CES cluster is multi-cluster mounting the file-systems and from the POSIX side, its fine most of the time. We've used the mmnetverify command to check that all is well as well. Of course this only checks the local cluster, not remote nodes, but as we aren't seeing expels and can access the FS, we assume that the GPFS layer is working fine. So we finally log a PMR with IBM, I catch a node in a broken state and pull a trace from it and upload that, and ask what other traces they might want (apparently there is no protocol trace for NFS in 4.2.2-3). Now, when we run this, I note that its doing things like mmlsfileset to the remote storage, coming from two clusters and some of this is timing out. We've already had issues with rp_filter on remote nodes causing expels, but the storage backend here has only 1 nic, and we can mount and access it all fine. So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not GPFS ping of course), but not make "admin" calls to it. Ssh appears to work fine as well BTW to it. So I check on my CES and this is multi-homed and rp_filter is enabled. Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm sure I'm an edge case, but it would be REALLY REALLY helpful to get mmnetverify to work across a cluster (e.g. I say this is a remote node and here's its FQDN, can you talk to it) which would have helped with diagnosis here. I'm not entirely sure why ssh etc would work and pass rp_filter, but not GPFS traffic (in some cases apparently), but I guess its something to do with how GPFS is binding and then the kernel routing layer. I'm still not sure if this is my root cause as the occurrences of the high load are a bit random (anything from every hour to being stable for 2-3 days), but since making the rp_filter change this afternoon, so far ...? I've created an RFE for mmnetverify to be able to test across a cluster... https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503 0 Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 36506 bytes Desc: image001.png URL: From pinto at scinet.utoronto.ca Thu May 11 19:17:06 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 11 May 2017 14:17:06 -0400 Subject: [gpfsug-discuss] BIG LAG since 3.5 on quota accounting reconciliation In-Reply-To: <20170331101834.162212izrlj4swu2@support.scinet.utoronto.ca> References: <20170331101834.162212izrlj4swu2@support.scinet.utoronto.ca> Message-ID: <20170511141706.942536htkinc957m@support.scinet.utoronto.ca> Just bumping up. When I first posted this subject at the end of March there was a UG meeting that drove people's attention. I hope to get some comments now. Thanks Jaime Quoting "Jaime Pinto" : > In the old days of DDN 9900 and gpfs 3.4 I only had to run mmcheckquota > once a month, usually after the massive monthly purge. > > I noticed that starting with the GSS and ESS appliances under 3.5 that > I needed to run mmcheckquota more often, at least once a week, or as > often as daily, to clear the slippage errors in the accounting > information, otherwise users complained that they were hitting their > quotas, even throughout they deleted a lot of stuff. > > More recently we adopted a G200 appliance (1.8PB), with v4.1, and now > things have gotten worst, and I have to run it twice daily, just in > case. > > So, what I am missing? Is their a parameter since 3.5 and through 4.1 > that we can set, so that GPFS will reconcile the quota accounting > internally more often and on its own? > > Thanks > Jaime > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From bbanister at jumptrading.com Thu May 11 19:20:47 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 11 May 2017 18:20:47 +0000 Subject: [gpfsug-discuss] Edge case failure mode In-Reply-To: References: Message-ID: <607e7c81dd3349fd8c0a8602d1938e3b@jumptrading.com> I was wondering why that 0 was left on that line alone... hahaha, -B From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, May 11, 2017 1:05 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Edge case failure mode Cheers Bryan ... http://goo.gl/YXitIF Points to: (Outlook/mailing list is line breaking and cutting the trailing 0) https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=105030 Simon From: > on behalf of "bbanister at jumptrading.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 11 May 2017 at 18:58 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] Edge case failure mode Hey Simon, I clicked your link but I think it went to a page that is not about this RFE: [cid:image001.png at 01D2CA59.65CF7300] Cheers, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, May 11, 2017 12:49 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Edge case failure mode Just following up on some discussions we had at the UG this week. I mentioned a few weeks back that we were having issues with failover of NFS, and we figured a work around to our clients for this so that failover works great now (plus there is some code fixes coming down the line as well to help). Here's my story of fun with protocol nodes ... Since then we've occasionally been seeing the load average of 1 CES node rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients. A lot of digging and we found that CTDB was reporting > 80% memory used, so we tweaked the page pool down to solve this. Great we thought ... But alas that wasn't the cause. Just to be clear 95% of the time, the CES node is fine, I can do and ls in the mounted file-systems and all is good. When the load rises to 400, an ls takes 20-30 seconds, so they are related, but what is the initial cause? Other CES nodes are 100% fine and if we do mmces node suspend, and then resume all is well on the node (and no other CES node assumes the problem as the IP moves). Its not always the same CES IP, node or even data centre, and most of the time is looks fine. I logged a ticket with OCF today, and one thing they suggested was to disable NFSv3 as they've seen similar behaviour at another site. As far as I know, all my NFS clients are v4, but sure we disable v3 anyway as its not actually needed. (Both at the ganesha layer, change the default for exports and reconfigure all existing exports to v4 only for good measure). That didn't help, but certainly worth a try! Note that my CES cluster is multi-cluster mounting the file-systems and from the POSIX side, its fine most of the time. We've used the mmnetverify command to check that all is well as well. Of course this only checks the local cluster, not remote nodes, but as we aren't seeing expels and can access the FS, we assume that the GPFS layer is working fine. So we finally log a PMR with IBM, I catch a node in a broken state and pull a trace from it and upload that, and ask what other traces they might want (apparently there is no protocol trace for NFS in 4.2.2-3). Now, when we run this, I note that its doing things like mmlsfileset to the remote storage, coming from two clusters and some of this is timing out. We've already had issues with rp_filter on remote nodes causing expels, but the storage backend here has only 1 nic, and we can mount and access it all fine. So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not GPFS ping of course), but not make "admin" calls to it. Ssh appears to work fine as well BTW to it. So I check on my CES and this is multi-homed and rp_filter is enabled. Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm sure I'm an edge case, but it would be REALLY REALLY helpful to get mmnetverify to work across a cluster (e.g. I say this is a remote node and here's its FQDN, can you talk to it) which would have helped with diagnosis here. I'm not entirely sure why ssh etc would work and pass rp_filter, but not GPFS traffic (in some cases apparently), but I guess its something to do with how GPFS is binding and then the kernel routing layer. I'm still not sure if this is my root cause as the occurrences of the high load are a bit random (anything from every hour to being stable for 2-3 days), but since making the rp_filter change this afternoon, so far ...? I've created an RFE for mmnetverify to be able to test across a cluster... https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503 0 Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 36506 bytes Desc: image001.png URL: From UWEFALKE at de.ibm.com Thu May 11 20:42:29 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 11 May 2017 21:42:29 +0200 Subject: [gpfsug-discuss] BIG LAG since 3.5 on quotaaccountingreconciliation In-Reply-To: <20170511141706.942536htkinc957m@support.scinet.utoronto.ca> References: <20170331101834.162212izrlj4swu2@support.scinet.utoronto.ca> <20170511141706.942536htkinc957m@support.scinet.utoronto.ca> Message-ID: Hi, Jaimie, we got the same problem, also with a GSS although I suppose it's rather to do with the code above GNR, but who knows. I have a PMR open for quite some time (and had others as well). Seems like things improved by upgrading the FS version, but atre not gone. However, these issues are to be solved via PMRs. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Jaime Pinto" To: "gpfsug main discussion list" , "Jaime Pinto" Date: 05/11/2017 08:17 PM Subject: Re: [gpfsug-discuss] BIG LAG since 3.5 on quota accounting reconciliation Sent by: gpfsug-discuss-bounces at spectrumscale.org Just bumping up. When I first posted this subject at the end of March there was a UG meeting that drove people's attention. I hope to get some comments now. Thanks Jaime Quoting "Jaime Pinto" : > In the old days of DDN 9900 and gpfs 3.4 I only had to run mmcheckquota > once a month, usually after the massive monthly purge. > > I noticed that starting with the GSS and ESS appliances under 3.5 that > I needed to run mmcheckquota more often, at least once a week, or as > often as daily, to clear the slippage errors in the accounting > information, otherwise users complained that they were hitting their > quotas, even throughout they deleted a lot of stuff. > > More recently we adopted a G200 appliance (1.8PB), with v4.1, and now > things have gotten worst, and I have to run it twice daily, just in > case. > > So, what I am missing? Is their a parameter since 3.5 and through 4.1 > that we can set, so that GPFS will reconcile the quota accounting > internally more often and on its own? > > Thanks > Jaime > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From damir.krstic at gmail.com Fri May 12 11:42:19 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Fri, 12 May 2017 10:42:19 +0000 Subject: [gpfsug-discuss] connected v. datagram mode Message-ID: I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and are in datagram mode. In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. What is is the right thing to do? Thanks in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Fri May 12 12:43:01 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Fri, 12 May 2017 07:43:01 -0400 Subject: [gpfsug-discuss] BIG LAG since 3.5 on quotaaccountingreconciliation In-Reply-To: References: <20170331101834.162212izrlj4swu2@support.scinet.utoronto.ca> <20170511141706.942536htkinc957m@support.scinet.utoronto.ca> Message-ID: <20170512074301.91955kiad218rl51@support.scinet.utoronto.ca> I like to give the community a chance to reflect on the issue, check their own installations and possibly give us all some comments. If in a few more days we still don't get any hints I'll have to open a couple of support tickets (IBM, DDN, Seagate, ...). Cheers Jaime Quoting "Uwe Falke" : > Hi, Jaimie, > > we got the same problem, also with a GSS although I suppose it's rather to > do with the code above GNR, but who knows. > I have a PMR open for quite some time (and had others as well). > Seems like things improved by upgrading the FS version, but atre not gone. > > > However, these issues are to be solved via PMRs. > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Andreas Hasse, Thomas Wolter > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" , > "Jaime Pinto" > Date: 05/11/2017 08:17 PM > Subject: Re: [gpfsug-discuss] BIG LAG since 3.5 on quota accounting > reconciliation > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Just bumping up. > When I first posted this subject at the end of March there was a UG > meeting that drove people's attention. > > I hope to get some comments now. > > Thanks > Jaime > > Quoting "Jaime Pinto" : > >> In the old days of DDN 9900 and gpfs 3.4 I only had to run mmcheckquota >> once a month, usually after the massive monthly purge. >> >> I noticed that starting with the GSS and ESS appliances under 3.5 that >> I needed to run mmcheckquota more often, at least once a week, or as >> often as daily, to clear the slippage errors in the accounting >> information, otherwise users complained that they were hitting their >> quotas, even throughout they deleted a lot of stuff. >> >> More recently we adopted a G200 appliance (1.8PB), with v4.1, and now >> things have gotten worst, and I have to run it twice daily, just in >> case. >> >> So, what I am missing? Is their a parameter since 3.5 and through 4.1 >> that we can set, so that GPFS will reconcile the quota accounting >> internally more often and on its own? >> >> Thanks >> Jaime >> > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From jonathon.anderson at colorado.edu Fri May 12 15:43:55 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Fri, 12 May 2017 14:43:55 +0000 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: Message-ID: This won?t tell you which to use; but datagram mode and connected mode in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that there?s no checking/retry built into the protocol; the other is ?reliable? and detects whether data is received completely and in the correct order. The last advice I heard for traditional IB was that the overhead of connected mode isn?t worth it, particularly if you?re using IPoIB (where you?re likely to be using TCP anyway). That said, on our OPA network we?re seeing the opposite advice; so I, to, am often unsure what the most correct configuration would be for any given fabric. ~jonathon On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Damir Krstic" wrote: I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and are in datagram mode. In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. What is is the right thing to do? Thanks in advance. Damir From aaron.s.knister at nasa.gov Fri May 12 15:48:14 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 12 May 2017 10:48:14 -0400 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: Message-ID: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> For what it's worth we've seen *significantly* better performance of streaming benchmarks of IPoIB with connected mode vs datagram mode on IB. -Aaron On 5/12/17 10:43 AM, Jonathon A Anderson wrote: > This won?t tell you which to use; but datagram mode and connected mode in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that there?s no checking/retry built into the protocol; the other is ?reliable? and detects whether data is received completely and in the correct order. > > The last advice I heard for traditional IB was that the overhead of connected mode isn?t worth it, particularly if you?re using IPoIB (where you?re likely to be using TCP anyway). That said, on our OPA network we?re seeing the opposite advice; so I, to, am often unsure what the most correct configuration would be for any given fabric. > > ~jonathon > > > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Damir Krstic" wrote: > > I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and > are in datagram mode. > > > In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. > What is is the right thing to do? > > > Thanks in advance. > Damir > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From janfrode at tanso.net Fri May 12 16:03:03 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Fri, 12 May 2017 15:03:03 +0000 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: I also don't know much about this, but the ESS quick deployment guide is quite clear on the we should use connected mode for IPoIB: -------------- Note: If using bonded IP over IB, do the following: Ensure that the CONNECTED_MODE=yes statement exists in the corresponding slave-bond interface scripts located in /etc/sysconfig/network-scripts directory of the management server and I/O server nodes. These scripts are created as part of the IP over IB bond creation. An example of the slave-bond interface with the modification is shown below. --------------- -jf fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister : > For what it's worth we've seen *significantly* better performance of > streaming benchmarks of IPoIB with connected mode vs datagram mode on IB. > > -Aaron > > On 5/12/17 10:43 AM, Jonathon A Anderson wrote: > > This won?t tell you which to use; but datagram mode and connected mode > in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that > there?s no checking/retry built into the protocol; the other is ?reliable? > and detects whether data is received completely and in the correct order. > > > > The last advice I heard for traditional IB was that the overhead of > connected mode isn?t worth it, particularly if you?re using IPoIB (where > you?re likely to be using TCP anyway). That said, on our OPA network we?re > seeing the opposite advice; so I, to, am often unsure what the most correct > configuration would be for any given fabric. > > > > ~jonathon > > > > > > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Damir Krstic" behalf of damir.krstic at gmail.com> wrote: > > > > I never fully understood the difference between connected v. > datagram mode beside the obvious packet size difference. Our NSD servers > (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our > 700+ clients are running RH6 and > > are in datagram mode. > > > > > > In a month we are upgrading our cluster to RedHat 7 and are debating > whether to leave the compute nodes in datagram mode or whether to switch > them to connected mode. > > What is is the right thing to do? > > > > > > Thanks in advance. > > Damir > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Fri May 12 16:05:47 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Fri, 12 May 2017 15:05:47 +0000 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: It may be true that you should always favor connected mode; but those instructions look like they?re specifically only talking about when you have bonded interfaces. ~jonathon On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jan-Frode Myklebust" wrote: I also don't know much about this, but the ESS quick deployment guide is quite clear on the we should use connected mode for IPoIB: -------------- Note: If using bonded IP over IB, do the following: Ensure that the CONNECTED_MODE=yes statement exists in the corresponding slave-bond interface scripts located in /etc/sysconfig/network-scripts directory of the management server and I/O server nodes. These scripts are created as part of the IP over IB bond creation. An example of the slave-bond interface with the modification is shown below. --------------- -jf fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister : For what it's worth we've seen *significantly* better performance of streaming benchmarks of IPoIB with connected mode vs datagram mode on IB. -Aaron On 5/12/17 10:43 AM, Jonathon A Anderson wrote: > This won?t tell you which to use; but datagram mode and connected mode in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that there?s no checking/retry built into the protocol; the other is ?reliable? and detects whether data is received completely and in the correct order. > > The last advice I heard for traditional IB was that the overhead of connected mode isn?t worth it, particularly if you?re using IPoIB (where you?re likely to be using TCP anyway). That said, on our OPA network we?re seeing the opposite advice; so I, to, am often unsure what the most correct configuration would be for any given fabric. > > ~jonathon > > > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Damir Krstic" wrote: > > I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and > are in datagram mode. > > > In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. > What is is the right thing to do? > > > Thanks in advance. > Damir > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From usa-principal at gpfsug.org Fri May 12 17:03:46 2017 From: usa-principal at gpfsug.org (usa-principal-gpfsug.org) Date: Fri, 12 May 2017 09:03:46 -0700 Subject: [gpfsug-discuss] YouTube Videos of Talks - April 4-5 US SSUG Meeting at NERSC Message-ID: All, The YouTube videos are now available on the Spectrum Scale/GPFS User Group channel, and will be on the IBM channel as well in the near term. https://www.youtube.com/playlist?list=PLrdepxEIEyCp1TqZ2z3WfGOgqO9oY01xY Cheers, Kristy From laurence at qsplace.co.uk Sat May 13 00:27:19 2017 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Sat, 13 May 2017 00:27:19 +0100 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: It also depends on the adapter. We have seen better performance using datagram with MLNX adapters however we see better in connected mode when using Intel truescale. Again as Jonathon has mentioned we have also seen better performance when using connected mode on active/slave bonded interface (even between a mixed MLNX/TS fabric). There is also a significant difference in the MTU size you can use in datagram vs connected mode, with datagram being limited to 2044 (if memory serves) there as connected mode can use 65536 (again if memory serves). I typically now run qperf and nsdperf benchmarks to find the best configuration. -- Lauz On 12/05/2017 16:05, Jonathon A Anderson wrote: > It may be true that you should always favor connected mode; but those instructions look like they?re specifically only talking about when you have bonded interfaces. > > ~jonathon > > > On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jan-Frode Myklebust" wrote: > > > > > I also don't know much about this, but the ESS quick deployment guide is quite clear on the we should use connected mode for IPoIB: > > -------------- > Note: If using bonded IP over IB, do the following: Ensure that the CONNECTED_MODE=yes statement exists in the corresponding slave-bond interface scripts located in /etc/sysconfig/network-scripts directory of the management server and I/O server nodes. These > scripts are created as part of the IP over IB bond creation. An example of the slave-bond interface with the modification is shown below. > --------------- > > > -jf > fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister : > > > For what it's worth we've seen *significantly* better performance of > streaming benchmarks of IPoIB with connected mode vs datagram mode on IB. > > -Aaron > > On 5/12/17 10:43 AM, Jonathon A Anderson wrote: > > This won?t tell you which to use; but datagram mode and connected mode in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that there?s no checking/retry built into the protocol; the other is ?reliable? and detects whether data is received > completely and in the correct order. > > > > The last advice I heard for traditional IB was that the overhead of connected mode isn?t worth it, particularly if you?re using IPoIB (where you?re likely to be using TCP anyway). That said, on our OPA network we?re seeing the opposite advice; so I, to, am > often unsure what the most correct configuration would be for any given fabric. > > > > ~jonathon > > > > > > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Damir Krstic" on behalf of damir.krstic at gmail.com> wrote: > > > > I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and > > are in datagram mode. > > > > > > In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. > > What is is the right thing to do? > > > > > > Thanks in advance. > > Damir > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at > spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at > spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stijn.deweirdt at ugent.be Sun May 14 10:16:12 2017 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Sun, 14 May 2017 11:16:12 +0200 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: hi all, does anyone know about the impact of memory usage? afaik, connected mode keeps buffers for each QP (old-ish mellanox (connectx-2, MLNX ofed2) instructions suggested not to use CM for large-ish (>128 nodes at that time) clusters. we never turned it back on, and now have 700 nodes. wrt IPoIB MTU, UD can have up to 4042 (or something like that) with correct opensm configuration. stijn On 05/13/2017 01:27 AM, Laurence Horrocks-Barlow wrote: > It also depends on the adapter. > > We have seen better performance using datagram with MLNX adapters > however we see better in connected mode when using Intel truescale. > Again as Jonathon has mentioned we have also seen better performance > when using connected mode on active/slave bonded interface (even between > a mixed MLNX/TS fabric). > > There is also a significant difference in the MTU size you can use in > datagram vs connected mode, with datagram being limited to 2044 (if > memory serves) there as connected mode can use 65536 (again if memory > serves). > > I typically now run qperf and nsdperf benchmarks to find the best > configuration. > > -- Lauz > > On 12/05/2017 16:05, Jonathon A Anderson wrote: >> It may be true that you should always favor connected mode; but those >> instructions look like they?re specifically only talking about when >> you have bonded interfaces. >> >> ~jonathon >> >> >> On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of Jan-Frode Myklebust" >> > janfrode at tanso.net> wrote: >> >> I also don't know much about this, but the ESS >> quick deployment guide is quite clear on the we should use connected >> mode for IPoIB: >> -------------- >> Note: If using bonded IP over IB, do the following: Ensure that >> the CONNECTED_MODE=yes statement exists in the corresponding >> slave-bond interface scripts located in /etc/sysconfig/network-scripts >> directory of the management server and I/O server nodes. These >> scripts are created as part of the IP over IB bond creation. An >> example of the slave-bond interface with the modification is shown below. >> --------------- >> -jf >> fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister >> : >> For what it's worth we've seen *significantly* better >> performance of >> streaming benchmarks of IPoIB with connected mode vs datagram >> mode on IB. >> -Aaron >> On 5/12/17 10:43 AM, Jonathon A Anderson wrote: >> > This won?t tell you which to use; but datagram mode and >> connected mode in IB is roughly analogous to UDB vs TCP in IP. One is >> ?unreliable? in that there?s no checking/retry built into the >> protocol; the other is ?reliable? and detects whether data is received >> completely and in the correct order. >> > >> > The last advice I heard for traditional IB was that the >> overhead of connected mode isn?t worth it, particularly if you?re >> using IPoIB (where you?re likely to be using TCP anyway). That said, >> on our OPA network we?re seeing the opposite advice; so I, to, am >> often unsure what the most correct configuration would be for >> any given fabric. >> > >> > ~jonathon >> > >> > >> > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org >> on behalf of Damir Krstic" > on behalf of damir.krstic at gmail.com> wrote: >> > >> > I never fully understood the difference between connected >> v. datagram mode beside the obvious packet size difference. Our NSD >> servers (ESS GL6 nodes) are installed with RedHat 7 and are in >> connected mode. Our 700+ clients are running RH6 and >> > are in datagram mode. >> > >> > >> > In a month we are upgrading our cluster to RedHat 7 and are >> debating whether to leave the compute nodes in datagram mode or >> whether to switch them to connected mode. >> > What is is the right thing to do? >> > >> > >> > Thanks in advance. >> > Damir >> > >> > >> > >> > _______________________________________________ >> > gpfsug-discuss mailing list >> > gpfsug-discuss at >> spectrumscale.org >> > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at >> spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Greg.Lehmann at csiro.au Mon May 15 00:41:13 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Sun, 14 May 2017 23:41:13 +0000 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: <82aac761681744b28e7010f22ef7cb81@exch1-cdc.nexus.csiro.au> I asked Mellanox about this nearly 2 years ago and was told around the 500 node mark there will be a tipping point and that datagram will be more useful after that. Memory utilisation was the issue. I've also seen references to smaller node counts more recently as well as generic recommendations to use datagram for any size cluster. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Stijn De Weirdt Sent: Sunday, 14 May 2017 7:16 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] connected v. datagram mode hi all, does anyone know about the impact of memory usage? afaik, connected mode keeps buffers for each QP (old-ish mellanox (connectx-2, MLNX ofed2) instructions suggested not to use CM for large-ish (>128 nodes at that time) clusters. we never turned it back on, and now have 700 nodes. wrt IPoIB MTU, UD can have up to 4042 (or something like that) with correct opensm configuration. stijn On 05/13/2017 01:27 AM, Laurence Horrocks-Barlow wrote: > It also depends on the adapter. > > We have seen better performance using datagram with MLNX adapters > however we see better in connected mode when using Intel truescale. > Again as Jonathon has mentioned we have also seen better performance > when using connected mode on active/slave bonded interface (even > between a mixed MLNX/TS fabric). > > There is also a significant difference in the MTU size you can use in > datagram vs connected mode, with datagram being limited to 2044 (if > memory serves) there as connected mode can use 65536 (again if memory > serves). > > I typically now run qperf and nsdperf benchmarks to find the best > configuration. > > -- Lauz > > On 12/05/2017 16:05, Jonathon A Anderson wrote: >> It may be true that you should always favor connected mode; but those >> instructions look like they?re specifically only talking about when >> you have bonded interfaces. >> >> ~jonathon >> >> >> On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of Jan-Frode Myklebust" >> > janfrode at tanso.net> wrote: >> >> I also don't know much about this, but the ESS >> quick deployment guide is quite clear on the we should use connected >> mode for IPoIB: >> -------------- >> Note: If using bonded IP over IB, do the following: Ensure that >> the CONNECTED_MODE=yes statement exists in the corresponding >> slave-bond interface scripts located in >> /etc/sysconfig/network-scripts directory of the management server and I/O server nodes. These >> scripts are created as part of the IP over IB bond creation. An >> example of the slave-bond interface with the modification is shown below. >> --------------- >> -jf >> fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister >> : >> For what it's worth we've seen *significantly* better >> performance of >> streaming benchmarks of IPoIB with connected mode vs datagram >> mode on IB. >> -Aaron >> On 5/12/17 10:43 AM, Jonathon A Anderson wrote: >> > This won?t tell you which to use; but datagram mode and >> connected mode in IB is roughly analogous to UDB vs TCP in IP. One is >> ?unreliable? in that there?s no checking/retry built into the >> protocol; the other is ?reliable? and detects whether data is received >> completely and in the correct order. >> > >> > The last advice I heard for traditional IB was that the >> overhead of connected mode isn?t worth it, particularly if you?re >> using IPoIB (where you?re likely to be using TCP anyway). That said, >> on our OPA network we?re seeing the opposite advice; so I, to, am >> often unsure what the most correct configuration would be for >> any given fabric. >> > >> > ~jonathon >> > >> > >> > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org >> on behalf of Damir Krstic" > on behalf of damir.krstic at gmail.com> wrote: >> > >> > I never fully understood the difference between connected >> v. datagram mode beside the obvious packet size difference. Our NSD >> servers (ESS GL6 nodes) are installed with RedHat 7 and are in >> connected mode. Our 700+ clients are running RH6 and >> > are in datagram mode. >> > >> > >> > In a month we are upgrading our cluster to RedHat 7 and are >> debating whether to leave the compute nodes in datagram mode or >> whether to switch them to connected mode. >> > What is is the right thing to do? >> > >> > >> > Thanks in advance. >> > Damir >> > >> > >> > >> > _______________________________________________ >> > gpfsug-discuss mailing list >> > gpfsug-discuss at >> spectrumscale.org >> > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at >> spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From varun.mittal at in.ibm.com Mon May 15 19:39:28 2017 From: varun.mittal at in.ibm.com (Varun Mittal3) Date: Tue, 16 May 2017 00:09:28 +0530 Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration In-Reply-To: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> References: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> Message-ID: Hi Christian Is the data on existing cluster being accesses only through SMB or is it shared with NFS users having same UID/GIDs ? What mechanism would you be using to migrate the data ? I mean, if it's pure smb and the data migration would also be over smb share only (using tool like robocopy), you need not have the same IDs on both the source and the target system. Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune From: "Fey, Christian" To: gpfsug main discussion list Date: 11/05/2017 09:07 PM Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The old ctdb cluster uses rid as idmap backend. Since CES officially supports only autorid, I tried to choose the right values for the idmap ranges / sizes to get the same IDs but was not successful with this. The old samba has a range assigned for their Active directory (idmap config XYZ : range = 1000000-1999999) My idea was to set autorid to the following during mmuserauth create: idmap config * : backend = autorid idmap config * : range = 200000-2999999 idmap config * : rangesize = 800000 With that it should use the first range for the builtin range and the second should then start with 1000000 like in the old rid config. Sadly, the range of the domain is the third one: /usr/lpp/mmfs/bin/net idmap get ranges RANGE 0: ALLOC RANGE 1: S-1-5-32 RANGE 2: S-1-5-21-123456789-123456789-123456789 Does anyone have an idea how to fix this, maybe in a supported way and without storing the IDs in the domain? Further on, does anyone use rid as backend, even if not officially supported? Maybe we could file a RPQ or sth. Like this. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 [attachment "smime.p7s" deleted by Varun Mittal3/India/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From p.childs at qmul.ac.uk Tue May 16 10:40:09 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 16 May 2017 10:40:09 +0100 Subject: [gpfsug-discuss] AFM Prefetch Missing Files Message-ID: I know it was said at the User group meeting last week that older versions of afm prefetch miss empty files and that this is now fixed in 4.2.2.3. We are in the middle of trying to migrate our files to a new filesystem, and since that was said I'm double checking for any mistakes etc. Anyway it looks like AFM prefetch also misses symlinks pointing to files that that don't exist. ie "dangling symlinks" or ones that point to files that either have not been created yet or have subsequently been deleted. or when files have been decompressed and a symlink extracted that points somewhere that is never going to exist. I'm still checking this, and as yet it does not look like its a data loss issue, but it could still cause things to not quiet work once the file migration is complete. Does anyone else know of any other types of files that might be missed and I need to be aware of? We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" using a gpfs policy to collect the list, we are using GPFS Multi-cluster to connect the two filesystems not NFS.... Thanks in advanced Peter Childs From service at metamodul.com Tue May 16 20:17:55 2017 From: service at metamodul.com (Hans-Joachim Ehlers) Date: Tue, 16 May 2017 21:17:55 +0200 (CEST) Subject: [gpfsug-discuss] Maximum network delay for a Quorum Buster node Message-ID: <1486746025.249506.1494962275357@email.1und1.de> An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Wed May 17 12:26:44 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Wed, 17 May 2017 11:26:44 +0000 Subject: [gpfsug-discuss] Introduction Message-ID: Hi All, I help to run a gpfs cluster at the Met Office, Exeter, UK. The cluster is running GPFS 4.2.2.2, it's used with slurm for batch work - primarily for postprocessing weather and climate change model data generated from our HPC. We currently have 8 NSD nodes with approx 3PB of storage with 70+ client nodes. Kind Regards Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Wed May 17 12:44:01 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Wed, 17 May 2017 11:44:01 +0000 Subject: [gpfsug-discuss] GPFS GUI Message-ID: Hello all, Does anyone have any experience with troubleshooting the new GPFS GUI? I've got it up and running but have a few weird problems with it... Maybe someone can help or point me in the right direction? 1. It keeps generating an alert saying that the cluster is down, when it isn't?? Event name: gui_cluster_down Component: GUI Entity type: Node Entity name: Event time: 17/05/2017 12:19:29 Message: The GUI detected that the cluster is down. Description: The GUI checks the cluster state. Cause: The GUI calculated that an insufficient amount of quorum nodes is up and running. User action: Check why the cluster lost quorum. Reporting node: Event type: Active health state of an entity which is monitored by the system. 2. It is collecting sensor data from the NSD nodes without any issue, but it won't collect sensor data from any of the client nodes? I have the pmsensors package installed on all the nodes in question , the service is enabled and running - the logs showing that it has connected to the collector. However in the GUI it just says "Performance collector did not return any data" 3. The NSD nodes are returning performance data, but are all displaying a state of unknown. Would be great if anyone has any experience or ideas on how to troubleshoot this! Thanks Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed May 17 12:58:15 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Wed, 17 May 2017 07:58:15 -0400 Subject: [gpfsug-discuss] GPFS GUI In-Reply-To: References: Message-ID: <0E0FB0DB-BC4B-40F7-AD49-B365EA57EE65@brown.edu> I have issues as well with the gui. The issue that I had most similar to yours came about because I had installed the collector RPM and enabled collectors on two server nodes, but the GUI was only getting data from one of them. Each client randomly selected a collector to deliver data to. So how are multiple collectors supposed to work? Active/Passive? Failover pairs? Shared storage? Better not be on GPFS? Maybe there is a place in the gui config to tell it to keep track of multiple collectors, but I gave up looking and turned of the second collector service and removed it from the candidates. Other issue I mentioned before is that it is totally confused about how many nodes are in the cluster (thinks 21, with 3 unhealthy) when there are only 12 nodes in all, all healthy. The nodes dashboard never finishes loading, and no means of digging deeper (text based info) to find out why it is wedged. ? ddj > On May 17, 2017, at 7:44 AM, Wilson, Neil wrote: > > Hello all, > > Does anyone have any experience with troubleshooting the new GPFS GUI? > I?ve got it up and running but have a few weird problems with it... > Maybe someone can help or point me in the right direction? > > 1. It keeps generating an alert saying that the cluster is down, when it isn?t?? > > Event name: > gui_cluster_down > Component: > GUI > Entity type: > Node > Entity name: > Event time: > 17/05/2017 12:19:29 > Message: > The GUI detected that the cluster is down. > Description: > The GUI checks the cluster state. > Cause: > The GUI calculated that an insufficient amount of quorum nodes is up and running. > User action: > Check why the cluster lost quorum. > Reporting node: > Event type: > Active health state of an entity which is monitored by the system. > > 2. It is collecting sensor data from the NSD nodes without any issue, but it won?t collect sensor data from any of the client nodes? > I have the pmsensors package installed on all the nodes in question , the service is enabled and running ? the logs showing that it has connected to the collector. > However in the GUI it just says ?Performance collector did not return any data? > > 3. The NSD nodes are returning performance data, but are all displaying a state of unknown. > > > Would be great if anyone has any experience or ideas on how to troubleshoot this! > > Thanks > Neil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed May 17 13:23:48 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 17 May 2017 12:23:48 +0000 Subject: [gpfsug-discuss] GPFS GUI Message-ID: I don?t run the GUI in production, so I can?t comment on those issues specifically. I have been running a federated collector cluster for some time and it?s been working as expected. I?ve been using the Zimon-Grafana bridge code to look at GPFS performance stats. The other part of this is the mmhealth/mmsysmonitor process that reports events. It?s been problematic for me, especially in larger clusters (400+ nodes). The mmsysmonitor process is overloading the master node (the cluster manager) with too many ?heartbeats? and ends up causing lots of issues and log messages. Evidently this is something IBM is aware of (at the 4.2.2-2 level) and they have fixes coming out in 4.2.3 PTF1. I ended up disabling the cluster wide collection of health stats to prevent the cluster manager issues. However, be aware that CES depends on the mmhealth data so tinkering with the config make cause other issues if you use CES. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "David D. Johnson" Reply-To: gpfsug main discussion list Date: Wednesday, May 17, 2017 at 6:58 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS GUI So how are multiple collectors supposed to work? Active/Passive? Failover pairs? Shared storage? Better not be on GPFS? Maybe there is a place in the gui config to tell it to keep track of multiple collectors, but I gave up looking and turned of the second collector service and removed it from the candidates. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Wed May 17 17:00:12 2017 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Wed, 17 May 2017 18:00:12 +0200 Subject: [gpfsug-discuss] GPFS GUI In-Reply-To: <0E0FB0DB-BC4B-40F7-AD49-B365EA57EE65@brown.edu> References: <0E0FB0DB-BC4B-40F7-AD49-B365EA57EE65@brown.edu> Message-ID: Hello all, if multiple collectors should work together in a federation, the collector peers need to he specified in the ZimonCollectors.cfg. The GUI will see data from all collectors if federation is set up. See documentation below in the KC (works in 4.2.2 and 4.2.3 alike): https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_federation.htm For the issue related to the nodes count, can you contact me per PN? Mit freundlichen Gr??en / Kind regards Markus Rohwedder IBM Spectrum Scale GUI Development From: "David D. Johnson" To: gpfsug main discussion list Date: 17.05.2017 13:59 Subject: Re: [gpfsug-discuss] GPFS GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org I have issues as well with the gui. The issue that I had most similar to yours came about because I had installed the collector RPM and enabled collectors on two server nodes, but the GUI was only getting data from one of them. Each client randomly selected a collector to deliver data to. So how are multiple collectors supposed to work? Active/Passive? Failover pairs? Shared storage? Better not be on GPFS? Maybe there is a place in the gui config to tell it to keep track of multiple collectors, but I gave up looking and turned of the second collector service and removed it from the candidates. Other issue I mentioned before is that it is totally confused about how many nodes are in the cluster (thinks 21, with 3 unhealthy) when there are only 12 nodes in all, all healthy. The nodes dashboard never finishes loading, and no means of digging deeper (text based info) to find out why it is wedged. ? ddj On May 17, 2017, at 7:44 AM, Wilson, Neil < neil.wilson at metoffice.gov.uk> wrote: Hello all, Does anyone have any experience with troubleshooting the new GPFS GUI? I?ve got it up and running but have a few weird problems with it... Maybe someone can help or point me in the right direction? 1. It keeps generating an alert saying that the cluster is down, when it isn?t?? Event name: gui_cluster_down Component: GUI Entity type: Node Entity name: Event time: 17/05/2017 12:19:29 Message: The GUI detected that the cluster is down. Description: The GUI checks the cluster state. Cause: The GUI calculated that an insufficient amount of quorum nodes is up and running. User action: Check why the cluster lost quorum. Reporting node: Event type: Active health state of an entity which is monitored by the system. 2. It is collecting sensor data from the NSD nodes without any issue, but it won?t collect sensor data from any of the client nodes? I have the pmsensors package installed on all the nodes in question , the service is enabled and running ? the logs showing that it has connected to the collector. However in the GUI it just says ?Performance collector did not return any data? 3. The NSD nodes are returning performance data, but are all displaying a state of unknown. Would be great if anyone has any experience or ideas on how to troubleshoot this! Thanks Neil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From carlz at us.ibm.com Wed May 17 17:11:40 2017 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 17 May 2017 16:11:40 +0000 Subject: [gpfsug-discuss] Brief survey on GPFS / Scale usage from Scale Development Message-ID: An HTML attachment was scrubbed... URL: From Christian.Fey at sva.de Wed May 17 20:09:42 2017 From: Christian.Fey at sva.de (Fey, Christian) Date: Wed, 17 May 2017 19:09:42 +0000 Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration In-Reply-To: References: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> Message-ID: <310fef91208741b0b8059e805077f40e@sva.de> Hi, we have an existing filesystem and want to move from homebrew Samba/CTDB to CES. Since there is a lot of data in it, relabeling / migrating is not an option. FS stays the same, only nodes that share the FS change. There is an option to change the range (delete the existing ranges, set the new ones) with "net idmap set range" but in my Lab setup I was not successful in changing it. --cut-- [root at gpfs4n1 src]# /usr/lpp/mmfs/bin/net idmap set range 0 S-1-5-21-123456789-... Failed to save domain mapping: NT_STATUS_INVALID_PARAMETER --cut-- Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 Mobil: +49 151 180 251 39 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Varun Mittal3 Gesendet: Montag, 15. Mai 2017 20:39 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Hi Christian Is the data on existing cluster being accesses only through SMB or is it shared with NFS users having same UID/GIDs ? What mechanism would you be using to migrate the data ? I mean, if it's pure smb and the data migration would also be over smb share only (using tool like robocopy), you need not have the same IDs on both the source and the target system. Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for "Fey, Christian" ---11/05/2017 09:07:57 PM---Hi all, I'm just doing a migration from a gpfs cluster w]"Fey, Christian" ---11/05/2017 09:07:57 PM---Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The ol From: "Fey, Christian" > To: gpfsug main discussion list > Date: 11/05/2017 09:07 PM Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The old ctdb cluster uses rid as idmap backend. Since CES officially supports only autorid, I tried to choose the right values for the idmap ranges / sizes to get the same IDs but was not successful with this. The old samba has a range assigned for their Active directory (idmap config XYZ : range = 1000000-1999999) My idea was to set autorid to the following during mmuserauth create: idmap config * : backend = autorid idmap config * : range = 200000-2999999 idmap config * : rangesize = 800000 With that it should use the first range for the builtin range and the second should then start with 1000000 like in the old rid config. Sadly, the range of the domain is the third one: /usr/lpp/mmfs/bin/net idmap get ranges RANGE 0: ALLOC RANGE 1: S-1-5-32 RANGE 2: S-1-5-21-123456789-123456789-123456789 Does anyone have an idea how to fix this, maybe in a supported way and without storing the IDs in the domain? Further on, does anyone use rid as backend, even if not officially supported? Maybe we could file a RPQ or sth. Like this. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 [attachment "smime.p7s" deleted by Varun Mittal3/India/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5467 bytes Desc: not available URL: From Christian.Fey at sva.de Wed May 17 20:37:36 2017 From: Christian.Fey at sva.de (Fey, Christian) Date: Wed, 17 May 2017 19:37:36 +0000 Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration References: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> Message-ID: <38b79da90bfc4c549c5971f06cfaf5e5@sva.de> I just got the information that there is a debugging switch for the "net" commands (-d10). Looks like the issue with setting the ranges is caused by my lab setup (complains that the ranges are still present). I will try again with a scratched config and report back. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 Mobil: +49 151 180 251 39 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 Von: Fey, Christian Gesendet: Mittwoch, 17. Mai 2017 21:10 An: gpfsug main discussion list Betreff: AW: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Hi, we have an existing filesystem and want to move from homebrew Samba/CTDB to CES. Since there is a lot of data in it, relabeling / migrating is not an option. FS stays the same, only nodes that share the FS change. There is an option to change the range (delete the existing ranges, set the new ones) with "net idmap set range" but in my Lab setup I was not successful in changing it. --cut-- [root at gpfs4n1 src]# /usr/lpp/mmfs/bin/net idmap set range 0 S-1-5-21-123456789-... Failed to save domain mapping: NT_STATUS_INVALID_PARAMETER --cut-- Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 Mobil: +49 151 180 251 39 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Varun Mittal3 Gesendet: Montag, 15. Mai 2017 20:39 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Hi Christian Is the data on existing cluster being accesses only through SMB or is it shared with NFS users having same UID/GIDs ? What mechanism would you be using to migrate the data ? I mean, if it's pure smb and the data migration would also be over smb share only (using tool like robocopy), you need not have the same IDs on both the source and the target system. Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for "Fey, Christian" ---11/05/2017 09:07:57 PM---Hi all, I'm just doing a migration from a gpfs cluster w]"Fey, Christian" ---11/05/2017 09:07:57 PM---Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The ol From: "Fey, Christian" > To: gpfsug main discussion list > Date: 11/05/2017 09:07 PM Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The old ctdb cluster uses rid as idmap backend. Since CES officially supports only autorid, I tried to choose the right values for the idmap ranges / sizes to get the same IDs but was not successful with this. The old samba has a range assigned for their Active directory (idmap config XYZ : range = 1000000-1999999) My idea was to set autorid to the following during mmuserauth create: idmap config * : backend = autorid idmap config * : range = 200000-2999999 idmap config * : rangesize = 800000 With that it should use the first range for the builtin range and the second should then start with 1000000 like in the old rid config. Sadly, the range of the domain is the third one: /usr/lpp/mmfs/bin/net idmap get ranges RANGE 0: ALLOC RANGE 1: S-1-5-32 RANGE 2: S-1-5-21-123456789-123456789-123456789 Does anyone have an idea how to fix this, maybe in a supported way and without storing the IDs in the domain? Further on, does anyone use rid as backend, even if not officially supported? Maybe we could file a RPQ or sth. Like this. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 [attachment "smime.p7s" deleted by Varun Mittal3/India/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5467 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed May 17 21:44:47 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 17 May 2017 16:44:47 -0400 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors Message-ID: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: * project3 * scratch3 * sysadmin3 I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we have no need or space to include *scratch3* on TSM. Question: how to craft the mmbackup command to backup /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? Below are 3 types of errors: 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up dependent fileset sysadmin3 is not supported Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope filesystem --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. These examples don't really cover my case: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples Thanks Jaime ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From luis.bolinches at fi.ibm.com Wed May 17 21:49:35 2017 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 17 May 2017 23:49:35 +0300 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> References: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> Message-ID: Hi have you tried to add exceptions on the TSM client config file? Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked on /IBM/GPFS/FSET1 dsm.sys ... DOMAIN /IBM/GPFS EXCLUDE.DIR /IBM/GPFS/FSET1 From: "Jaime Pinto" To: "gpfsug main discussion list" Date: 17-05-17 23:44 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: * project3 * scratch3 * sysadmin3 I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we have no need or space to include *scratch3* on TSM. Question: how to craft the mmbackup command to backup /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? Below are 3 types of errors: 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up dependent fileset sysadmin3 is not supported Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope filesystem --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. These examples don't really cover my case: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples Thanks Jaime ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu May 18 00:48:58 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 17 May 2017 19:48:58 -0400 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: References: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> Message-ID: <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Quoting "Luis Bolinches" : > Hi > > have you tried to add exceptions on the TSM client config file? Hey Luis, That would work as well (mechanically), however it's not elegant or efficient. When you have over 1PB and 200M files on scratch it will take many hours and several helper nodes to traverse that fileset just to be negated by TSM. In fact exclusion on TSM are just as inefficient. Considering that I want to keep project and sysadmin on different domains then it's much worst, since we have to traverse and exclude scratch & (project|sysadmin) twice, once to capture sysadmin and again to capture project. If I have to use exclusion rules it has to rely sole on gpfs rules, and somehow not traverse scratch at all. I suspect there is a way to do this properly, however the examples on the gpfs guide and other references are not exhaustive. They only show a couple of trivial cases. However my situation is not unique. I suspect there are may facilities having to deal with backup of HUGE filesets. So the search is on. Thanks Jaime > > Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked > on /IBM/GPFS/FSET1 > > dsm.sys > ... > > DOMAIN /IBM/GPFS > EXCLUDE.DIR /IBM/GPFS/FSET1 > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > Date: 17-05-17 23:44 > Subject: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: > * project3 > * scratch3 > * sysadmin3 > > I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we > have no need or space to include *scratch3* on TSM. > > Question: how to craft the mmbackup command to backup > /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? > > Below are 3 types of errors: > > 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > > ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up > dependent fileset sysadmin3 is not supported > Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > > 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope filesystem --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > These examples don't really cover my case: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > > > Thanks > Jaime > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From pinto at scinet.utoronto.ca Thu May 18 02:43:29 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 17 May 2017 21:43:29 -0400 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> References: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Message-ID: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 -------------------------------------------------------- mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. -------------------------------------------------------- Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 -------------------------------------------------------- Will post the outcome. Jaime Quoting "Jaime Pinto" : > Quoting "Luis Bolinches" : > >> Hi >> >> have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > >> >> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked >> on /IBM/GPFS/FSET1 >> >> dsm.sys >> ... >> >> DOMAIN /IBM/GPFS >> EXCLUDE.DIR /IBM/GPFS/FSET1 >> >> >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" >> Date: 17-05-17 23:44 >> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >> * project3 >> * scratch3 >> * sysadmin3 >> >> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >> have no need or space to include *scratch3* on TSM. >> >> Question: how to craft the mmbackup command to backup >> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >> >> Below are 3 types of errors: >> >> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --tsm-errorlog $logfile -L 2 >> >> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >> cannot be specified at the same time. >> >> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> >> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >> dependent fileset sysadmin3 is not supported >> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> >> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope filesystem --tsm-errorlog $logfile -L 2 >> >> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >> cannot be specified at the same time. >> >> These examples don't really cover my case: >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >> >> >> Thanks >> Jaime >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From dominic.mueller at de.ibm.com Thu May 18 07:09:31 2017 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Thu, 18 May 2017 06:09:31 +0000 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu May 18 07:09:33 2017 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 18 May 2017 06:09:33 +0000 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Message-ID: An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Thu May 18 10:08:20 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 May 2017 10:08:20 +0100 Subject: [gpfsug-discuss] AFM Prefetch Missing Files In-Reply-To: References: Message-ID: Further investigation and checking says 4.2.1 afmctl prefetch is missing empty directories (not files as said previously) and noted by the update in 4.2.2.3. However I've found it is also missing symlinks both dangling (pointing to files that don't exist) and not. I can't see any actual data loss which is good. I'm looking to work around this with find /data2/$fileset -noleaf \( \( -type d -empty \) -o \( -type l \) \) -printf "%p -> %l\n" My initial testing says this should work. (/data2/$fileset is the destination "cache" fileset) It looks like this should catch everything, But I'm wondering if anyone else has noticed any other things afmctl prefetch misses. Thanks in advance Peter Childs On 16/05/17 10:40, Peter Childs wrote: > I know it was said at the User group meeting last week that older > versions of afm prefetch miss empty files and that this is now fixed > in 4.2.2.3. > > We are in the middle of trying to migrate our files to a new > filesystem, and since that was said I'm double checking for any > mistakes etc. > > Anyway it looks like AFM prefetch also misses symlinks pointing to > files that that don't exist. ie "dangling symlinks" or ones that point > to files that either have not been created yet or have subsequently > been deleted. or when files have been decompressed and a symlink > extracted that points somewhere that is never going to exist. > > I'm still checking this, and as yet it does not look like its a data > loss issue, but it could still cause things to not quiet work once the > file migration is complete. > > Does anyone else know of any other types of files that might be missed > and I need to be aware of? > > We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" > using a gpfs policy to collect the list, we are using GPFS > Multi-cluster to connect the two filesystems not NFS.... > > Thanks in advanced > > > Peter Childs > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From neil.wilson at metoffice.gov.uk Thu May 18 10:24:53 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Thu, 18 May 2017 09:24:53 +0000 Subject: [gpfsug-discuss] AFM Prefetch Missing Files In-Reply-To: References: Message-ID: We recently migrated several hundred TB from an Isilon cluster to our GPFS cluster using AFM using NFS gateways mostly using 4.2.2.2 , the main thing we noticed was that it would not migrate empty directories - we worked around the issue by getting a list of the missing directories and running it through a simple script that cd's into each directory then lists the empty directory. I didn't come across any issues with symlinks not being prefetched, just the directories. Regards Neil Wilson -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter Childs Sent: 18 May 2017 10:08 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] AFM Prefetch Missing Files Further investigation and checking says 4.2.1 afmctl prefetch is missing empty directories (not files as said previously) and noted by the update in 4.2.2.3. However I've found it is also missing symlinks both dangling (pointing to files that don't exist) and not. I can't see any actual data loss which is good. I'm looking to work around this with find /data2/$fileset -noleaf \( \( -type d -empty \) -o \( -type l \) \) -printf "%p -> %l\n" My initial testing says this should work. (/data2/$fileset is the destination "cache" fileset) It looks like this should catch everything, But I'm wondering if anyone else has noticed any other things afmctl prefetch misses. Thanks in advance Peter Childs On 16/05/17 10:40, Peter Childs wrote: > I know it was said at the User group meeting last week that older > versions of afm prefetch miss empty files and that this is now fixed > in 4.2.2.3. > > We are in the middle of trying to migrate our files to a new > filesystem, and since that was said I'm double checking for any > mistakes etc. > > Anyway it looks like AFM prefetch also misses symlinks pointing to > files that that don't exist. ie "dangling symlinks" or ones that point > to files that either have not been created yet or have subsequently > been deleted. or when files have been decompressed and a symlink > extracted that points somewhere that is never going to exist. > > I'm still checking this, and as yet it does not look like its a data > loss issue, but it could still cause things to not quiet work once the > file migration is complete. > > Does anyone else know of any other types of files that might be missed > and I need to be aware of? > > We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" > using a gpfs policy to collect the list, we are using GPFS > Multi-cluster to connect the two filesystems not NFS.... > > Thanks in advanced > > > Peter Childs > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Thu May 18 14:33:29 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 May 2017 09:33:29 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Message-ID: When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset.... If you didn't say otherwise, inodes come from the default "root" fileset.... Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned.... BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers" -- (very technical hint) you may need to permute the bit order before you discern a simple pattern... From: "Luis Bolinches" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 05/18/2017 02:10 AM Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi There is no direct way to convert the one fileset that is dependent to independent or viceversa. I would suggest to take a look to chapter 5 of the 2014 redbook, lots of definitions about GPFS ILM including filesets http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only place that is explained but I honestly believe is a good single start point. It also needs an update as does nto have anything on CES nor ESS, so anyone in this list feel free to give feedback on that page people with funding decisions listen there. So you are limited to either migrate the data from that fileset to a new independent fileset (multiple ways to do that) or use the TSM client config. ----- Original message ----- From: "Jaime Pinto" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug main discussion list" , "Jaime Pinto" Cc: Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Date: Thu, May 18, 2017 4:43 AM There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 -------------------------------------------------------- mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. -------------------------------------------------------- Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 -------------------------------------------------------- Will post the outcome. Jaime Quoting "Jaime Pinto" : > Quoting "Luis Bolinches" : > >> Hi >> >> have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > >> >> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked >> on /IBM/GPFS/FSET1 >> >> dsm.sys >> ... >> >> DOMAIN /IBM/GPFS >> EXCLUDE.DIR /IBM/GPFS/FSET1 >> >> >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" >> Date: 17-05-17 23:44 >> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >> * project3 >> * scratch3 >> * sysadmin3 >> >> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >> have no need or space to include *scratch3* on TSM. >> >> Question: how to craft the mmbackup command to backup >> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >> >> Below are 3 types of errors: >> >> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --tsm-errorlog $logfile -L 2 >> >> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >> cannot be specified at the same time. >> >> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> >> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >> dependent fileset sysadmin3 is not supported >> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> >> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope filesystem --tsm-errorlog $logfile -L 2 >> >> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >> cannot be specified at the same time. >> >> These examples don't really cover my case: >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >> >> >> Thanks >> Jaime >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu May 18 14:58:51 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 18 May 2017 09:58:51 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Message-ID: <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that are > in a separable range of inode numbers - this allows GPFS to efficiently do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset.... If you didn't say otherwise, inodes come from the > default "root" fileset.... > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned.... > > BTW - if you look at the bits of the inode numbers carefully --- you may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor ESS, > so anyone in this list feel free to give feedback on that page people with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a new > independent fileset (multiple ways to do that) or use the TSM client > config. > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Jaime Pinto" > Cc: > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Date: Thu, May 18, 2017 4:43 AM > > There is hope. See reference link below: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > > > The issue has to do with dependent vs. independent filesets, something > I didn't even realize existed until now. Our filesets are dependent > (for no particular reason), so I have to find a way to turn them into > independent. > > The proper option syntax is "--scope inodespace", and the error > message actually flagged that out, however I didn't know how to > interpret what I saw: > > > # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > -------------------------------------------------------- > mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 > 21:27:43 EDT 2017. > -------------------------------------------------------- > Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* > fileset sysadmin3 is not supported > Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > -------------------------------------------------------- > > Will post the outcome. > Jaime > > > > Quoting "Jaime Pinto" : > >> Quoting "Luis Bolinches" : >> >>> Hi >>> >>> have you tried to add exceptions on the TSM client config file? >> >> Hey Luis, >> >> That would work as well (mechanically), however it's not elegant or >> efficient. When you have over 1PB and 200M files on scratch it will >> take many hours and several helper nodes to traverse that fileset just >> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >> Considering that I want to keep project and sysadmin on different >> domains then it's much worst, since we have to traverse and exclude >> scratch & (project|sysadmin) twice, once to capture sysadmin and again >> to capture project. >> >> If I have to use exclusion rules it has to rely sole on gpfs rules, and >> somehow not traverse scratch at all. >> >> I suspect there is a way to do this properly, however the examples on >> the gpfs guide and other references are not exhaustive. They only show >> a couple of trivial cases. >> >> However my situation is not unique. I suspect there are may facilities >> having to deal with backup of HUGE filesets. >> >> So the search is on. >> >> Thanks >> Jaime >> >> >> >> >>> >>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is > linked >>> on /IBM/GPFS/FSET1 >>> >>> dsm.sys >>> ... >>> >>> DOMAIN /IBM/GPFS >>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>> >>> >>> From: "Jaime Pinto" >>> To: "gpfsug main discussion list" > >>> Date: 17-05-17 23:44 >>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>> * project3 >>> * scratch3 >>> * sysadmin3 >>> >>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>> have no need or space to include *scratch3* on TSM. >>> >>> Question: how to craft the mmbackup command to backup >>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>> >>> Below are 3 types of errors: >>> >>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --tsm-errorlog $logfile -L 2 >>> >>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>> cannot be specified at the same time. >>> >>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope inodespace --tsm-errorlog $logfile -L 2 >>> >>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>> dependent fileset sysadmin3 is not supported >>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>> fileset level backup. exit 1 >>> >>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope filesystem --tsm-errorlog $logfile -L 2 >>> >>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>> cannot be specified at the same time. >>> >>> These examples don't really cover my case: >>> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >>> >>> >>> Thanks >>> Jaime >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>> Oy IBM Finland Ab >>> PL 265, 00101 Helsinki, Finland >>> Business ID, Y-tunnus: 0195876-3 >>> Registered in Finland >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of > Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From p.childs at qmul.ac.uk Thu May 18 15:12:05 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 May 2017 15:12:05 +0100 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> Message-ID: <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> As I understand it, mmbackup calls mmapplypolicy so this stands for mmapplypolicy too..... mmapplypolicy scans the metadata inodes (file) as requested depending on the query supplied. You can ask mmapplypolicy to scan a fileset, inode space or filesystem. If scanning a fileset it scans the inode space that fileset is dependant on, for all files in that fileset. Smaller inode spaces hence less to scan, hence its faster to use an independent filesets, you get a list of what to process quicker. Another advantage is that once an inode is allocated you can't deallocate it, however you can delete independent filesets and hence deallocate the inodes, so if you have a task which has losts and lots of small files which are only needed for a short period of time, you can create a new independent fileset for them work on them and then blow them away afterwards. I like independent filesets I'm guessing the only reason dependant filesets are used by default is history..... Peter On 18/05/17 14:58, Jaime Pinto wrote: > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > >> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always >> think >> and try to read that as "inode space". >> >> An "independent fileset" has all the attributes of an (older-fashioned) >> dependent fileset PLUS all of its files are represented by inodes >> that are >> in a separable range of inode numbers - this allows GPFS to >> efficiently do >> snapshots of just that inode-space (uh... independent fileset)... >> >> And... of course the files of dependent filesets must also be >> represented >> by inodes -- those inode numbers are within the inode-space of whatever >> the containing independent fileset is... as was chosen when you created >> the fileset.... If you didn't say otherwise, inodes come from the >> default "root" fileset.... >> >> Clear as your bath-water, no? >> >> So why does mmbackup care one way or another ??? Stay tuned.... >> >> BTW - if you look at the bits of the inode numbers carefully --- you may >> not immediately discern what I mean by a "separable range of inode >> numbers" -- (very technical hint) you may need to permute the bit order >> before you discern a simple pattern... >> >> >> >> From: "Luis Bolinches" >> To: gpfsug-discuss at spectrumscale.org >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 05/18/2017 02:10 AM >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope >> errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi >> >> There is no direct way to convert the one fileset that is dependent to >> independent or viceversa. >> >> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >> definitions about GPFS ILM including filesets >> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >> place that is explained but I honestly believe is a good single start >> point. It also needs an update as does nto have anything on CES nor ESS, >> so anyone in this list feel free to give feedback on that page people >> with >> funding decisions listen there. >> >> So you are limited to either migrate the data from that fileset to a new >> independent fileset (multiple ways to do that) or use the TSM client >> config. >> >> ----- Original message ----- >> From: "Jaime Pinto" >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: "gpfsug main discussion list" , >> "Jaime Pinto" >> Cc: >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >> Date: Thu, May 18, 2017 4:43 AM >> >> There is hope. See reference link below: >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm >> >> >> >> The issue has to do with dependent vs. independent filesets, something >> I didn't even realize existed until now. Our filesets are dependent >> (for no particular reason), so I have to find a way to turn them into >> independent. >> >> The proper option syntax is "--scope inodespace", and the error >> message actually flagged that out, however I didn't know how to >> interpret what I saw: >> >> >> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> -------------------------------------------------------- >> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >> 21:27:43 EDT 2017. >> -------------------------------------------------------- >> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >> fileset sysadmin3 is not supported >> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> -------------------------------------------------------- >> >> Will post the outcome. >> Jaime >> >> >> >> Quoting "Jaime Pinto" : >> >>> Quoting "Luis Bolinches" : >>> >>>> Hi >>>> >>>> have you tried to add exceptions on the TSM client config file? >>> >>> Hey Luis, >>> >>> That would work as well (mechanically), however it's not elegant or >>> efficient. When you have over 1PB and 200M files on scratch it will >>> take many hours and several helper nodes to traverse that fileset just >>> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >>> Considering that I want to keep project and sysadmin on different >>> domains then it's much worst, since we have to traverse and exclude >>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>> to capture project. >>> >>> If I have to use exclusion rules it has to rely sole on gpfs rules, and >>> somehow not traverse scratch at all. >>> >>> I suspect there is a way to do this properly, however the examples on >>> the gpfs guide and other references are not exhaustive. They only show >>> a couple of trivial cases. >>> >>> However my situation is not unique. I suspect there are may facilities >>> having to deal with backup of HUGE filesets. >>> >>> So the search is on. >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>>> >>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >> linked >>>> on /IBM/GPFS/FSET1 >>>> >>>> dsm.sys >>>> ... >>>> >>>> DOMAIN /IBM/GPFS >>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>> >>>> >>>> From: "Jaime Pinto" >>>> To: "gpfsug main discussion list" >> >>>> Date: 17-05-17 23:44 >>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>> * project3 >>>> * scratch3 >>>> * sysadmin3 >>>> >>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>> have no need or space to include *scratch3* on TSM. >>>> >>>> Question: how to craft the mmbackup command to backup >>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>> >>>> Below are 3 types of errors: >>>> >>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>> dependent fileset sysadmin3 is not supported >>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>> fileset level backup. exit 1 >>>> >>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> These examples don't really cover my case: >>>> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >> >> >>>> >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>>> >>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>> Oy IBM Finland Ab >>>> PL 265, 00101 Helsinki, Finland >>>> Business ID, Y-tunnus: 0195876-3 >>>> Registered in Finland >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From a.g.richmond at leeds.ac.uk Thu May 18 15:22:55 2017 From: a.g.richmond at leeds.ac.uk (Aidan Richmond) Date: Thu, 18 May 2017 15:22:55 +0100 Subject: [gpfsug-discuss] Keytab error trying to join an active directory domain Message-ID: Hello I'm trying to join an AD domain for SMB and NFS protocol sharing but I keep getting a "Failed to generate the kerberos keytab file" error. The command I'm running is /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type ad --netbios-name @name@ --servers @adserver@ --user-name @username@ --idmap-role master --enable-nfs-kerberos --unixmap-domains "DS(1000-9999999)" A correct keytab does appears to be created on the host I run this on (one of two protocol nodes) but not on the other one. -- Aidan Richmond Apple/Unix Support Officer, IT Garstang 10.137 Faculty of Biological Sciences University of Leeds Clarendon Way LS2 9JT Tel:0113 3434252 a.g.richmond at leeds.ac.uk From makaplan at us.ibm.com Thu May 18 15:23:30 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 May 2017 10:23:30 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> Message-ID: Jaime, While we're waiting for the mmbackup expert to weigh in, notice that the mmbackup command does have a -P option that allows you to provide a customized policy rules file. So... a fairly safe hack is to do a trial mmbackup run, capture the automatically generated policy file, and then augment it with FOR FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup for real with your customized policy file. mmbackup uses mmapplypolicy which by itself is happy to limit its directory scan to a particular fileset by using mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope fileset .... However, mmbackup probably has other worries and for simpliciity and helping make sure you get complete, sensible backups, apparently has imposed some restrictions to preserve sanity (yours and our support team! ;-) ) ... (For example, suppose you were doing incremental backups, starting at different paths each time? -- happy to do so, but when disaster strikes and you want to restore -- you'll end up confused and/or unhappy!) "converting from one fileset to another" --- sorry there is no such thing. Filesets are kinda like little filesystems within filesystems. Moving a file from one fileset to another requires a copy operation. There is no fast move nor hardlinking. --marc From: "Jaime Pinto" To: "gpfsug main discussion list" , "Marc A Kaplan" Date: 05/18/2017 09:58 AM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that are > in a separable range of inode numbers - this allows GPFS to efficiently do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset.... If you didn't say otherwise, inodes come from the > default "root" fileset.... > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned.... > > BTW - if you look at the bits of the inode numbers carefully --- you may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor ESS, > so anyone in this list feel free to give feedback on that page people with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a new > independent fileset (multiple ways to do that) or use the TSM client > config. > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Jaime Pinto" > Cc: > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Date: Thu, May 18, 2017 4:43 AM > > There is hope. See reference link below: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > > > The issue has to do with dependent vs. independent filesets, something > I didn't even realize existed until now. Our filesets are dependent > (for no particular reason), so I have to find a way to turn them into > independent. > > The proper option syntax is "--scope inodespace", and the error > message actually flagged that out, however I didn't know how to > interpret what I saw: > > > # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > -------------------------------------------------------- > mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 > 21:27:43 EDT 2017. > -------------------------------------------------------- > Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* > fileset sysadmin3 is not supported > Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > -------------------------------------------------------- > > Will post the outcome. > Jaime > > > > Quoting "Jaime Pinto" : > >> Quoting "Luis Bolinches" : >> >>> Hi >>> >>> have you tried to add exceptions on the TSM client config file? >> >> Hey Luis, >> >> That would work as well (mechanically), however it's not elegant or >> efficient. When you have over 1PB and 200M files on scratch it will >> take many hours and several helper nodes to traverse that fileset just >> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >> Considering that I want to keep project and sysadmin on different >> domains then it's much worst, since we have to traverse and exclude >> scratch & (project|sysadmin) twice, once to capture sysadmin and again >> to capture project. >> >> If I have to use exclusion rules it has to rely sole on gpfs rules, and >> somehow not traverse scratch at all. >> >> I suspect there is a way to do this properly, however the examples on >> the gpfs guide and other references are not exhaustive. They only show >> a couple of trivial cases. >> >> However my situation is not unique. I suspect there are may facilities >> having to deal with backup of HUGE filesets. >> >> So the search is on. >> >> Thanks >> Jaime >> >> >> >> >>> >>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is > linked >>> on /IBM/GPFS/FSET1 >>> >>> dsm.sys >>> ... >>> >>> DOMAIN /IBM/GPFS >>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>> >>> >>> From: "Jaime Pinto" >>> To: "gpfsug main discussion list" > >>> Date: 17-05-17 23:44 >>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>> * project3 >>> * scratch3 >>> * sysadmin3 >>> >>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>> have no need or space to include *scratch3* on TSM. >>> >>> Question: how to craft the mmbackup command to backup >>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>> >>> Below are 3 types of errors: >>> >>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --tsm-errorlog $logfile -L 2 >>> >>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>> cannot be specified at the same time. >>> >>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope inodespace --tsm-errorlog $logfile -L 2 >>> >>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>> dependent fileset sysadmin3 is not supported >>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>> fileset level backup. exit 1 >>> >>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope filesystem --tsm-errorlog $logfile -L 2 >>> >>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>> cannot be specified at the same time. >>> >>> These examples don't really cover my case: >>> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >>> >>> >>> Thanks >>> Jaime >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>> Oy IBM Finland Ab >>> PL 265, 00101 Helsinki, Finland >>> Business ID, Y-tunnus: 0195876-3 >>> Registered in Finland >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of > Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu May 18 15:24:17 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Thu, 18 May 2017 10:24:17 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> Message-ID: <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> Here is one big reason independent filesets are problematic: A5.13: Table 43. Maximum number of filesets Version of GPFS Maximum Number of Dependent Filesets Maximum Number of Independent Filesets IBM Spectrum Scale V4 10,000 1,000 GPFS V3.5 10,000 1,000 Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. This is true of the root namespace as well, but there?s only one number to watch per filesystem. ? ddj Dave Johnson Brown University > On May 18, 2017, at 10:12 AM, Peter Childs wrote: > > As I understand it, > > mmbackup calls mmapplypolicy so this stands for mmapplypolicy too..... > > mmapplypolicy scans the metadata inodes (file) as requested depending on the query supplied. > > You can ask mmapplypolicy to scan a fileset, inode space or filesystem. > > If scanning a fileset it scans the inode space that fileset is dependant on, for all files in that fileset. Smaller inode spaces hence less to scan, hence its faster to use an independent filesets, you get a list of what to process quicker. > > Another advantage is that once an inode is allocated you can't deallocate it, however you can delete independent filesets and hence deallocate the inodes, so if you have a task which has losts and lots of small files which are only needed for a short period of time, you can create a new independent fileset for them work on them and then blow them away afterwards. > > I like independent filesets I'm guessing the only reason dependant filesets are used by default is history..... > > > Peter > > > On 18/05/17 14:58, Jaime Pinto wrote: >> Thanks for the explanation Mark and Luis, >> >> It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. >> >> When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. >> >> I'm looking forward to your explanation as to why mmbackup cares one way or another. >> >> I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? >> >> Thanks >> Jaime >> >> Quoting "Marc A Kaplan" : >> >>> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think >>> and try to read that as "inode space". >>> >>> An "independent fileset" has all the attributes of an (older-fashioned) >>> dependent fileset PLUS all of its files are represented by inodes that are >>> in a separable range of inode numbers - this allows GPFS to efficiently do >>> snapshots of just that inode-space (uh... independent fileset)... >>> >>> And... of course the files of dependent filesets must also be represented >>> by inodes -- those inode numbers are within the inode-space of whatever >>> the containing independent fileset is... as was chosen when you created >>> the fileset.... If you didn't say otherwise, inodes come from the >>> default "root" fileset.... >>> >>> Clear as your bath-water, no? >>> >>> So why does mmbackup care one way or another ??? Stay tuned.... >>> >>> BTW - if you look at the bits of the inode numbers carefully --- you may >>> not immediately discern what I mean by a "separable range of inode >>> numbers" -- (very technical hint) you may need to permute the bit order >>> before you discern a simple pattern... >>> >>> >>> >>> From: "Luis Bolinches" >>> To: gpfsug-discuss at spectrumscale.org >>> Cc: gpfsug-discuss at spectrumscale.org >>> Date: 05/18/2017 02:10 AM >>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> Hi >>> >>> There is no direct way to convert the one fileset that is dependent to >>> independent or viceversa. >>> >>> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >>> definitions about GPFS ILM including filesets >>> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >>> place that is explained but I honestly believe is a good single start >>> point. It also needs an update as does nto have anything on CES nor ESS, >>> so anyone in this list feel free to give feedback on that page people with >>> funding decisions listen there. >>> >>> So you are limited to either migrate the data from that fileset to a new >>> independent fileset (multiple ways to do that) or use the TSM client >>> config. >>> >>> ----- Original message ----- >>> From: "Jaime Pinto" >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> To: "gpfsug main discussion list" , >>> "Jaime Pinto" >>> Cc: >>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Date: Thu, May 18, 2017 4:43 AM >>> >>> There is hope. See reference link below: >>> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm >>> >>> >>> The issue has to do with dependent vs. independent filesets, something >>> I didn't even realize existed until now. Our filesets are dependent >>> (for no particular reason), so I have to find a way to turn them into >>> independent. >>> >>> The proper option syntax is "--scope inodespace", and the error >>> message actually flagged that out, however I didn't know how to >>> interpret what I saw: >>> >>> >>> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope inodespace --tsm-errorlog $logfile -L 2 >>> -------------------------------------------------------- >>> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >>> 21:27:43 EDT 2017. >>> -------------------------------------------------------- >>> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >>> fileset sysadmin3 is not supported >>> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >>> fileset level backup. exit 1 >>> -------------------------------------------------------- >>> >>> Will post the outcome. >>> Jaime >>> >>> >>> >>> Quoting "Jaime Pinto" : >>> >>>> Quoting "Luis Bolinches" : >>>> >>>>> Hi >>>>> >>>>> have you tried to add exceptions on the TSM client config file? >>>> >>>> Hey Luis, >>>> >>>> That would work as well (mechanically), however it's not elegant or >>>> efficient. When you have over 1PB and 200M files on scratch it will >>>> take many hours and several helper nodes to traverse that fileset just >>>> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >>>> Considering that I want to keep project and sysadmin on different >>>> domains then it's much worst, since we have to traverse and exclude >>>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>>> to capture project. >>>> >>>> If I have to use exclusion rules it has to rely sole on gpfs rules, and >>>> somehow not traverse scratch at all. >>>> >>>> I suspect there is a way to do this properly, however the examples on >>>> the gpfs guide and other references are not exhaustive. They only show >>>> a couple of trivial cases. >>>> >>>> However my situation is not unique. I suspect there are may facilities >>>> having to deal with backup of HUGE filesets. >>>> >>>> So the search is on. >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> >>>> >>>>> >>>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >>> linked >>>>> on /IBM/GPFS/FSET1 >>>>> >>>>> dsm.sys >>>>> ... >>>>> >>>>> DOMAIN /IBM/GPFS >>>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>>> >>>>> >>>>> From: "Jaime Pinto" >>>>> To: "gpfsug main discussion list" >>> >>>>> Date: 17-05-17 23:44 >>>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>>> >>>>> >>>>> >>>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>>> * project3 >>>>> * scratch3 >>>>> * sysadmin3 >>>>> >>>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>>> have no need or space to include *scratch3* on TSM. >>>>> >>>>> Question: how to craft the mmbackup command to backup >>>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>>> >>>>> Below are 3 types of errors: >>>>> >>>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>>> cannot be specified at the same time. >>>>> >>>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>>> dependent fileset sysadmin3 is not supported >>>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>>> fileset level backup. exit 1 >>>>> >>>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>>> cannot be specified at the same time. >>>>> >>>>> These examples don't really cover my case: >>>>> >>> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >>> >>>>> >>>>> >>>>> Thanks >>>>> Jaime >>>>> >>>>> >>>>> ************************************ >>>>> TELL US ABOUT YOUR SUCCESS STORIES >>>>> http://www.scinethpc.ca/testimonials >>>>> ************************************ >>>>> --- >>>>> Jaime Pinto >>>>> SciNet HPC Consortium - Compute/Calcul Canada >>>>> www.scinet.utoronto.ca - www.computecanada.ca >>>>> University of Toronto >>>>> 661 University Ave. (MaRS), Suite 1140 >>>>> Toronto, ON, M5G1M1 >>>>> P: 416-978-2755 >>>>> C: 416-505-1477 >>>>> >>>>> ---------------------------------------------------------------- >>>>> This message was sent using IMP at SciNet Consortium, University of >>>>> Toronto. >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>>> Oy IBM Finland Ab >>>>> PL 265, 00101 Helsinki, Finland >>>>> Business ID, Y-tunnus: 0195876-3 >>>>> Registered in Finland >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>> Oy IBM Finland Ab >>> PL 265, 00101 Helsinki, Finland >>> Business ID, Y-tunnus: 0195876-3 >>> Registered in Finland >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 18 15:32:42 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 18 May 2017 14:32:42 +0000 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> Message-ID: Thanks, I was just about to post that, and I guess is still the reason a dependent fileset is still the default without the ?inode-space new option fileset creation. I do wonder why there is a limit of 1000, whether it?s just IBM not envisaging any customer needing more than that? We?ve only got 414 at the moment but that will grow to over 500 this year. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of David D. Johnson Sent: 18 May 2017 15:24 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Here is one big reason independent filesets are problematic: A5.13: Table 43. Maximum number of filesets Version of GPFS Maximum Number of Dependent Filesets Maximum Number of Independent Filesets IBM Spectrum Scale V4 10,000 1,000 GPFS V3.5 10,000 1,000 Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. This is true of the root namespace as well, but there?s only one number to watch per filesystem. ? ddj Dave Johnson Brown University -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 18 15:36:33 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 18 May 2017 14:36:33 +0000 Subject: [gpfsug-discuss] Keytab error trying to join an active directory domain In-Reply-To: References: Message-ID: It's crappy, I had to put the command in 10+ times before it would work. Just keep at it (that's my takeaway, sorry I'm not that technical when it comes to Kerberos). Could be a domain replication thing. Is time syncing properly across all your CES nodes? Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Aidan Richmond Sent: 18 May 2017 15:23 To: gpfsug main discussion list Subject: [gpfsug-discuss] Keytab error trying to join an active directory domain Hello I'm trying to join an AD domain for SMB and NFS protocol sharing but I keep getting a "Failed to generate the kerberos keytab file" error. The command I'm running is /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type ad --netbios-name @name@ --servers @adserver@ --user-name @username@ --idmap-role master --enable-nfs-kerberos --unixmap-domains "DS(1000-9999999)" A correct keytab does appears to be created on the host I run this on (one of two protocol nodes) but not on the other one. -- Aidan Richmond Apple/Unix Support Officer, IT Garstang 10.137 Faculty of Biological Sciences University of Leeds Clarendon Way LS2 9JT Tel:0113 3434252 a.g.richmond at leeds.ac.uk _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ulmer at ulmer.org Thu May 18 15:47:59 2017 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 18 May 2017 10:47:59 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> Message-ID: <578DD835-3829-431D-9E89-A3DFDCDA8F39@ulmer.org> Each independent fileset is an allocation area, and they are (I believe) handled separately. There are a set of allocation managers for each file system, and when you need to create a file you ask one of them to do it. Each one has a pre-negotiated range of inodes to hand out, so there isn?t a single point of contention for creating files. I?m pretty sure that means that they all have to have a range for each inode space. This is based on my own logic, and could be complete nonsense. While I?m sure that limit could be changed eventually, there?s probably some efficiencies in not making it much bigger than it needs to be. I don?t know if it would take an on-disk format change or not. So how do you decide that a use case gets it?s own fileset, and do you just always use independent or is there an evaluation? I?m just curious because I like to understand lots of different points of view ? feel free to tell me to go away. :) -- Stephen > On May 18, 2017, at 10:32 AM, Sobey, Richard A > wrote: > > Thanks, I was just about to post that, and I guess is still the reason a dependent fileset is still the default without the ?inode-space new option fileset creation. > > I do wonder why there is a limit of 1000, whether it?s just IBM not envisaging any customer needing more than that? We?ve only got 414 at the moment but that will grow to over 500 this year. > > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org ]On Behalf Of David D. Johnson > Sent: 18 May 2017 15:24 > To: gpfsug main discussion list > > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors > > Here is one big reason independent filesets are problematic: > A5.13: > Table 43. Maximum number of filesets > Version of GPFS > Maximum Number of Dependent Filesets > Maximum Number of Independent Filesets > IBM Spectrum Scale V4 > 10,000 > 1,000 > GPFS V3.5 > 10,000 > 1,000 > Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. > If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. > This is true of the root namespace as well, but there?s only one number to watch per filesystem. > > ? ddj > Dave Johnson > Brown University > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 18 15:58:20 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 18 May 2017 14:58:20 +0000 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <578DD835-3829-431D-9E89-A3DFDCDA8F39@ulmer.org> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> <578DD835-3829-431D-9E89-A3DFDCDA8F39@ulmer.org> Message-ID: So it could be that we didn?t really know what we were doing when our system was installed (and still don?t by some of the messages I post *cough*) but basically I think we?re quite similar to other shops where we resell GPFS to departmental users internally and it just made some sense to break down each one into a fileset. We can then snapshot each one individually (7402 snapshots at the moment) and apply quotas. I know your question was why independent and not dependent ? but I honestly don?t know. I assume it?s to do with not crossing the streams if you?ll excuse the obvious film reference. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Stephen Ulmer Sent: 18 May 2017 15:48 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Each independent fileset is an allocation area, and they are (I believe) handled separately. There are a set of allocation managers for each file system, and when you need to create a file you ask one of them to do it. Each one has a pre-negotiated range of inodes to hand out, so there isn?t a single point of contention for creating files. I?m pretty sure that means that they all have to have a range for each inode space. This is based on my own logic, and could be complete nonsense. While I?m sure that limit could be changed eventually, there?s probably some efficiencies in not making it much bigger than it needs to be. I don?t know if it would take an on-disk format change or not. So how do you decide that a use case gets it?s own fileset, and do you just always use independent or is there an evaluation? I?m just curious because I like to understand lots of different points of view ? feel free to tell me to go away. :) -- Stephen On May 18, 2017, at 10:32 AM, Sobey, Richard A > wrote: Thanks, I was just about to post that, and I guess is still the reason a dependent fileset is still the default without the ?inode-space new option fileset creation. I do wonder why there is a limit of 1000, whether it?s just IBM not envisaging any customer needing more than that? We?ve only got 414 at the moment but that will grow to over 500 this year. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org]On Behalf Of David D. Johnson Sent: 18 May 2017 15:24 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Here is one big reason independent filesets are problematic: A5.13: Table 43. Maximum number of filesets Version of GPFS Maximum Number of Dependent Filesets Maximum Number of Independent Filesets IBM Spectrum Scale V4 10,000 1,000 GPFS V3.5 10,000 1,000 Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. This is true of the root namespace as well, but there?s only one number to watch per filesystem. ? ddj Dave Johnson Brown University _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Thu May 18 16:15:30 2017 From: chair at spectrumscale.org (Spectrum Scale UG Chair (Simon Thompson)) Date: Thu, 18 May 2017 16:15:30 +0100 Subject: [gpfsug-discuss] Save the date SSUG 2018 - April 18th/19th 2018 Message-ID: Hi All, A date for your diary, #SSUG18 in the UK will be taking place on: April 18th, 19th 2018 Please mark it in your diaries now :-) We'll confirm other details etc nearer the time, but date is confirmed. Simon From john.hearns at asml.com Thu May 18 16:23:29 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 18 May 2017 15:23:29 +0000 Subject: [gpfsug-discuss] Introduction Message-ID: Good afternoon all, my name is John Hearns. I am currently working with the HPC Team at ASML in the Netherlands, the market sector is manufacturing. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu May 18 17:36:46 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 18 May 2017 12:36:46 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> Message-ID: <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset .... > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such thing. > Filesets are kinda like little filesystems within filesystems. Moving a > file from one fileset to another requires a copy operation. There is no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > >> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always > think >> and try to read that as "inode space". >> >> An "independent fileset" has all the attributes of an (older-fashioned) >> dependent fileset PLUS all of its files are represented by inodes that > are >> in a separable range of inode numbers - this allows GPFS to efficiently > do >> snapshots of just that inode-space (uh... independent fileset)... >> >> And... of course the files of dependent filesets must also be > represented >> by inodes -- those inode numbers are within the inode-space of whatever >> the containing independent fileset is... as was chosen when you created >> the fileset.... If you didn't say otherwise, inodes come from the >> default "root" fileset.... >> >> Clear as your bath-water, no? >> >> So why does mmbackup care one way or another ??? Stay tuned.... >> >> BTW - if you look at the bits of the inode numbers carefully --- you may >> not immediately discern what I mean by a "separable range of inode >> numbers" -- (very technical hint) you may need to permute the bit order >> before you discern a simple pattern... >> >> >> >> From: "Luis Bolinches" >> To: gpfsug-discuss at spectrumscale.org >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 05/18/2017 02:10 AM >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope > errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi >> >> There is no direct way to convert the one fileset that is dependent to >> independent or viceversa. >> >> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >> definitions about GPFS ILM including filesets >> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >> place that is explained but I honestly believe is a good single start >> point. It also needs an update as does nto have anything on CES nor ESS, >> so anyone in this list feel free to give feedback on that page people > with >> funding decisions listen there. >> >> So you are limited to either migrate the data from that fileset to a new >> independent fileset (multiple ways to do that) or use the TSM client >> config. >> >> ----- Original message ----- >> From: "Jaime Pinto" >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: "gpfsug main discussion list" , >> "Jaime Pinto" >> Cc: >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >> Date: Thu, May 18, 2017 4:43 AM >> >> There is hope. See reference link below: >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > >> >> >> The issue has to do with dependent vs. independent filesets, something >> I didn't even realize existed until now. Our filesets are dependent >> (for no particular reason), so I have to find a way to turn them into >> independent. >> >> The proper option syntax is "--scope inodespace", and the error >> message actually flagged that out, however I didn't know how to >> interpret what I saw: >> >> >> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> -------------------------------------------------------- >> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >> 21:27:43 EDT 2017. >> -------------------------------------------------------- >> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >> fileset sysadmin3 is not supported >> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> -------------------------------------------------------- >> >> Will post the outcome. >> Jaime >> >> >> >> Quoting "Jaime Pinto" : >> >>> Quoting "Luis Bolinches" : >>> >>>> Hi >>>> >>>> have you tried to add exceptions on the TSM client config file? >>> >>> Hey Luis, >>> >>> That would work as well (mechanically), however it's not elegant or >>> efficient. When you have over 1PB and 200M files on scratch it will >>> take many hours and several helper nodes to traverse that fileset just >>> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >>> Considering that I want to keep project and sysadmin on different >>> domains then it's much worst, since we have to traverse and exclude >>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>> to capture project. >>> >>> If I have to use exclusion rules it has to rely sole on gpfs rules, and >>> somehow not traverse scratch at all. >>> >>> I suspect there is a way to do this properly, however the examples on >>> the gpfs guide and other references are not exhaustive. They only show >>> a couple of trivial cases. >>> >>> However my situation is not unique. I suspect there are may facilities >>> having to deal with backup of HUGE filesets. >>> >>> So the search is on. >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>>> >>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >> linked >>>> on /IBM/GPFS/FSET1 >>>> >>>> dsm.sys >>>> ... >>>> >>>> DOMAIN /IBM/GPFS >>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>> >>>> >>>> From: "Jaime Pinto" >>>> To: "gpfsug main discussion list" >> >>>> Date: 17-05-17 23:44 >>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>> * project3 >>>> * scratch3 >>>> * sysadmin3 >>>> >>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>> have no need or space to include *scratch3* on TSM. >>>> >>>> Question: how to craft the mmbackup command to backup >>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>> >>>> Below are 3 types of errors: >>>> >>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>> dependent fileset sysadmin3 is not supported >>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>> fileset level backup. exit 1 >>>> >>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> These examples don't really cover my case: >>>> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >> >>>> >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>>> >>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>> Oy IBM Finland Ab >>>> PL 265, 00101 Helsinki, Finland >>>> Business ID, Y-tunnus: 0195876-3 >>>> Registered in Finland >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From makaplan at us.ibm.com Thu May 18 18:05:59 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 May 2017 13:05:59 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> Message-ID: 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup wants to support incremental backups (using what it calls its shadow database) and keep both your sanity and its sanity -- so mmbackup limits you to either full filesystem or full inode-space (independent fileset.) If you want to do something else, okay, but you have to be careful and be sure of yourself. IBM will not be able to jump in and help you if and when it comes time to restore and you discover that your backup(s) were not complete. 2. If you decide you're a big boy (or woman or XXX) and want to do some hacking ... Fine... But even then, I suggest you do the smallest hack that will mostly achieve your goal... DO NOT think you can create a custom policy rules list for mmbackup out of thin air.... Capture the rules mmbackup creates and make small changes to that -- And as with any disaster recovery plan..... Plan your Test and Test your Plan.... Then do some dry run recoveries before you really "need" to do a real recovery. I only even sugest this because Jaime says he has a huge filesystem with several dependent filesets and he really, really wants to do a partial backup, without first copying or re-organizing the filesets. HMMM.... otoh... if you have one or more dependent filesets that are smallish, and/or you don't need the backups -- create independent filesets, copy/move/delete the data, rename, voila. From: "Jaime Pinto" To: "Marc A Kaplan" Cc: "gpfsug main discussion list" Date: 05/18/2017 12:36 PM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset .... > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such thing. > Filesets are kinda like little filesystems within filesystems. Moving a > file from one fileset to another requires a copy operation. There is no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > >> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always > think >> and try to read that as "inode space". >> >> An "independent fileset" has all the attributes of an (older-fashioned) >> dependent fileset PLUS all of its files are represented by inodes that > are >> in a separable range of inode numbers - this allows GPFS to efficiently > do >> snapshots of just that inode-space (uh... independent fileset)... >> >> And... of course the files of dependent filesets must also be > represented >> by inodes -- those inode numbers are within the inode-space of whatever >> the containing independent fileset is... as was chosen when you created >> the fileset.... If you didn't say otherwise, inodes come from the >> default "root" fileset.... >> >> Clear as your bath-water, no? >> >> So why does mmbackup care one way or another ??? Stay tuned.... >> >> BTW - if you look at the bits of the inode numbers carefully --- you may >> not immediately discern what I mean by a "separable range of inode >> numbers" -- (very technical hint) you may need to permute the bit order >> before you discern a simple pattern... >> >> >> >> From: "Luis Bolinches" >> To: gpfsug-discuss at spectrumscale.org >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 05/18/2017 02:10 AM >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope > errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi >> >> There is no direct way to convert the one fileset that is dependent to >> independent or viceversa. >> >> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >> definitions about GPFS ILM including filesets >> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >> place that is explained but I honestly believe is a good single start >> point. It also needs an update as does nto have anything on CES nor ESS, >> so anyone in this list feel free to give feedback on that page people > with >> funding decisions listen there. >> >> So you are limited to either migrate the data from that fileset to a new >> independent fileset (multiple ways to do that) or use the TSM client >> config. >> >> ----- Original message ----- >> From: "Jaime Pinto" >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: "gpfsug main discussion list" , >> "Jaime Pinto" >> Cc: >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >> Date: Thu, May 18, 2017 4:43 AM >> >> There is hope. See reference link below: >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > >> >> >> The issue has to do with dependent vs. independent filesets, something >> I didn't even realize existed until now. Our filesets are dependent >> (for no particular reason), so I have to find a way to turn them into >> independent. >> >> The proper option syntax is "--scope inodespace", and the error >> message actually flagged that out, however I didn't know how to >> interpret what I saw: >> >> >> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> -------------------------------------------------------- >> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >> 21:27:43 EDT 2017. >> -------------------------------------------------------- >> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >> fileset sysadmin3 is not supported >> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> -------------------------------------------------------- >> >> Will post the outcome. >> Jaime >> >> >> >> Quoting "Jaime Pinto" : >> >>> Quoting "Luis Bolinches" : >>> >>>> Hi >>>> >>>> have you tried to add exceptions on the TSM client config file? >>> >>> Hey Luis, >>> >>> That would work as well (mechanically), however it's not elegant or >>> efficient. When you have over 1PB and 200M files on scratch it will >>> take many hours and several helper nodes to traverse that fileset just >>> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >>> Considering that I want to keep project and sysadmin on different >>> domains then it's much worst, since we have to traverse and exclude >>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>> to capture project. >>> >>> If I have to use exclusion rules it has to rely sole on gpfs rules, and >>> somehow not traverse scratch at all. >>> >>> I suspect there is a way to do this properly, however the examples on >>> the gpfs guide and other references are not exhaustive. They only show >>> a couple of trivial cases. >>> >>> However my situation is not unique. I suspect there are may facilities >>> having to deal with backup of HUGE filesets. >>> >>> So the search is on. >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>>> >>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >> linked >>>> on /IBM/GPFS/FSET1 >>>> >>>> dsm.sys >>>> ... >>>> >>>> DOMAIN /IBM/GPFS >>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>> >>>> >>>> From: "Jaime Pinto" >>>> To: "gpfsug main discussion list" >> >>>> Date: 17-05-17 23:44 >>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>> * project3 >>>> * scratch3 >>>> * sysadmin3 >>>> >>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>> have no need or space to include *scratch3* on TSM. >>>> >>>> Question: how to craft the mmbackup command to backup >>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>> >>>> Below are 3 types of errors: >>>> >>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>> dependent fileset sysadmin3 is not supported >>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>> fileset level backup. exit 1 >>>> >>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> These examples don't really cover my case: >>>> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >> >>>> >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>>> >>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>> Oy IBM Finland Ab >>>> PL 265, 00101 Helsinki, Finland >>>> Business ID, Y-tunnus: 0195876-3 >>>> Registered in Finland >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu May 18 20:02:46 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 18 May 2017 15:02:46 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> Message-ID: <20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Ok Mark I'll follow your option 2) suggestion, and capture what mmbackup is using as a rule first, then modify it. I imagine by 'capture' you are referring to the -L n level I use? -L n Controls the level of information displayed by the mmbackup command. Larger values indicate the display of more detailed information. n should be one of the following values: 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. Thanks Jaime Quoting "Marc A Kaplan" : > 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup > wants to support incremental backups (using what it calls its shadow > database) and keep both your sanity and its sanity -- so mmbackup limits > you to either full filesystem or full inode-space (independent fileset.) > If you want to do something else, okay, but you have to be careful and be > sure of yourself. IBM will not be able to jump in and help you if and when > it comes time to restore and you discover that your backup(s) were not > complete. > > 2. If you decide you're a big boy (or woman or XXX) and want to do some > hacking ... Fine... But even then, I suggest you do the smallest hack > that will mostly achieve your goal... > DO NOT think you can create a custom policy rules list for mmbackup out of > thin air.... Capture the rules mmbackup creates and make small changes to > that -- > And as with any disaster recovery plan..... Plan your Test and Test your > Plan.... Then do some dry run recoveries before you really "need" to do a > real recovery. > > I only even sugest this because Jaime says he has a huge filesystem with > several dependent filesets and he really, really wants to do a partial > backup, without first copying or re-organizing the filesets. > > HMMM.... otoh... if you have one or more dependent filesets that are > smallish, and/or you don't need the backups -- create independent > filesets, copy/move/delete the data, rename, voila. > > > > From: "Jaime Pinto" > To: "Marc A Kaplan" > Cc: "gpfsug main discussion list" > Date: 05/18/2017 12:36 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Marc > > The -P option may be a very good workaround, but I still have to test it. > > I'm currently trying to craft the mm rule, as minimalist as possible, > however I'm not sure about what attributes mmbackup expects to see. > > Below is my first attempt. It would be nice to get comments from > somebody familiar with the inner works of mmbackup. > > Thanks > Jaime > > > /* A macro to abbreviate VARCHAR */ > define([vc],[VARCHAR($1)]) > > /* Define three external lists */ > RULE EXTERNAL LIST 'allfiles' EXEC > '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' > > /* Generate a list of all files, directories, plus all other file > system objects, > like symlinks, named pipes, etc. Include the owner's id with each > object and > sort them by the owner's id */ > > RULE 'r1' LIST 'allfiles' > DIRECTORIES_PLUS > SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || > vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) > FROM POOL 'system' > FOR FILESET('sysadmin3') > > /* Files in special filesets, such as those excluded, are never traversed > */ > RULE 'ExcSpecialFile' EXCLUDE > FOR FILESET('scratch3','project3') > > > > > > Quoting "Marc A Kaplan" : > >> Jaime, >> >> While we're waiting for the mmbackup expert to weigh in, notice that > the >> mmbackup command does have a -P option that allows you to provide a >> customized policy rules file. >> >> So... a fairly safe hack is to do a trial mmbackup run, capture the >> automatically generated policy file, and then augment it with FOR >> FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup > for >> real with your customized policy file. >> >> mmbackup uses mmapplypolicy which by itself is happy to limit its >> directory scan to a particular fileset by using >> >> mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope >> fileset .... >> >> However, mmbackup probably has other worries and for simpliciity and >> helping make sure you get complete, sensible backups, apparently has >> imposed some restrictions to preserve sanity (yours and our support > team! >> ;-) ) ... (For example, suppose you were doing incremental backups, >> starting at different paths each time? -- happy to do so, but when >> disaster strikes and you want to restore -- you'll end up confused > and/or >> unhappy!) >> >> "converting from one fileset to another" --- sorry there is no such > thing. >> Filesets are kinda like little filesystems within filesystems. Moving > a >> file from one fileset to another requires a copy operation. There is > no >> fast move nor hardlinking. >> >> --marc >> >> >> >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" > , >> "Marc A Kaplan" >> Date: 05/18/2017 09:58 AM >> Subject: Re: [gpfsug-discuss] What is an independent fileset? > was: >> mmbackup with fileset : scope errors >> >> >> >> Thanks for the explanation Mark and Luis, >> >> It begs the question: why filesets are created as dependent by >> default, if the adverse repercussions can be so great afterward? Even >> in my case, where I manage GPFS and TSM deployments (and I have been >> around for a while), didn't realize at all that not adding and extra >> option at fileset creation time would cause me huge trouble with >> scaling later on as I try to use mmbackup. >> >> When you have different groups to manage file systems and backups that >> don't read each-other's manuals ahead of time then we have a really >> bad recipe. >> >> I'm looking forward to your explanation as to why mmbackup cares one >> way or another. >> >> I'm also hoping for a hint as to how to configure backup exclusion >> rules on the TSM side to exclude fileset traversing on the GPFS side. >> Is mmbackup smart enough (actually smarter than TSM client itself) to >> read the exclusion rules on the TSM configuration and apply them >> before traversing? >> >> Thanks >> Jaime >> >> Quoting "Marc A Kaplan" : >> >>> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always >> think >>> and try to read that as "inode space". >>> >>> An "independent fileset" has all the attributes of an (older-fashioned) >>> dependent fileset PLUS all of its files are represented by inodes that >> are >>> in a separable range of inode numbers - this allows GPFS to efficiently >> do >>> snapshots of just that inode-space (uh... independent fileset)... >>> >>> And... of course the files of dependent filesets must also be >> represented >>> by inodes -- those inode numbers are within the inode-space of whatever >>> the containing independent fileset is... as was chosen when you created >>> the fileset.... If you didn't say otherwise, inodes come from the >>> default "root" fileset.... >>> >>> Clear as your bath-water, no? >>> >>> So why does mmbackup care one way or another ??? Stay tuned.... >>> >>> BTW - if you look at the bits of the inode numbers carefully --- you > may >>> not immediately discern what I mean by a "separable range of inode >>> numbers" -- (very technical hint) you may need to permute the bit order >>> before you discern a simple pattern... >>> >>> >>> >>> From: "Luis Bolinches" >>> To: gpfsug-discuss at spectrumscale.org >>> Cc: gpfsug-discuss at spectrumscale.org >>> Date: 05/18/2017 02:10 AM >>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope >> errors >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> Hi >>> >>> There is no direct way to convert the one fileset that is dependent to >>> independent or viceversa. >>> >>> I would suggest to take a look to chapter 5 of the 2014 redbook, lots > of >>> definitions about GPFS ILM including filesets >>> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the > only >>> place that is explained but I honestly believe is a good single start >>> point. It also needs an update as does nto have anything on CES nor > ESS, >>> so anyone in this list feel free to give feedback on that page people >> with >>> funding decisions listen there. >>> >>> So you are limited to either migrate the data from that fileset to a > new >>> independent fileset (multiple ways to do that) or use the TSM client >>> config. >>> >>> ----- Original message ----- >>> From: "Jaime Pinto" >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> To: "gpfsug main discussion list" , >>> "Jaime Pinto" >>> Cc: >>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Date: Thu, May 18, 2017 4:43 AM >>> >>> There is hope. See reference link below: >>> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > >> >>> >>> >>> The issue has to do with dependent vs. independent filesets, something >>> I didn't even realize existed until now. Our filesets are dependent >>> (for no particular reason), so I have to find a way to turn them into >>> independent. >>> >>> The proper option syntax is "--scope inodespace", and the error >>> message actually flagged that out, however I didn't know how to >>> interpret what I saw: >>> >>> >>> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope inodespace --tsm-errorlog $logfile -L 2 >>> -------------------------------------------------------- >>> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >>> 21:27:43 EDT 2017. >>> -------------------------------------------------------- >>> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >>> fileset sysadmin3 is not supported >>> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >>> fileset level backup. exit 1 >>> -------------------------------------------------------- >>> >>> Will post the outcome. >>> Jaime >>> >>> >>> >>> Quoting "Jaime Pinto" : >>> >>>> Quoting "Luis Bolinches" : >>>> >>>>> Hi >>>>> >>>>> have you tried to add exceptions on the TSM client config file? >>>> >>>> Hey Luis, >>>> >>>> That would work as well (mechanically), however it's not elegant or >>>> efficient. When you have over 1PB and 200M files on scratch it will >>>> take many hours and several helper nodes to traverse that fileset just >>>> to be negated by TSM. In fact exclusion on TSM are just as > inefficient. >>>> Considering that I want to keep project and sysadmin on different >>>> domains then it's much worst, since we have to traverse and exclude >>>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>>> to capture project. >>>> >>>> If I have to use exclusion rules it has to rely sole on gpfs rules, > and >>>> somehow not traverse scratch at all. >>>> >>>> I suspect there is a way to do this properly, however the examples on >>>> the gpfs guide and other references are not exhaustive. They only show >>>> a couple of trivial cases. >>>> >>>> However my situation is not unique. I suspect there are may facilities >>>> having to deal with backup of HUGE filesets. >>>> >>>> So the search is on. >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> >>>> >>>>> >>>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >>> linked >>>>> on /IBM/GPFS/FSET1 >>>>> >>>>> dsm.sys >>>>> ... >>>>> >>>>> DOMAIN /IBM/GPFS >>>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>>> >>>>> >>>>> From: "Jaime Pinto" >>>>> To: "gpfsug main discussion list" >>> >>>>> Date: 17-05-17 23:44 >>>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>>> >>>>> >>>>> >>>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>>> * project3 >>>>> * scratch3 >>>>> * sysadmin3 >>>>> >>>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>>> have no need or space to include *scratch3* on TSM. >>>>> >>>>> Question: how to craft the mmbackup command to backup >>>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>>> >>>>> Below are 3 types of errors: >>>>> >>>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>>> cannot be specified at the same time. >>>>> >>>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>>> dependent fileset sysadmin3 is not supported >>>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>>> fileset level backup. exit 1 >>>>> >>>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>>> cannot be specified at the same time. >>>>> >>>>> These examples don't really cover my case: >>>>> >>> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >> >>> >>>>> >>>>> >>>>> Thanks >>>>> Jaime >>>>> >>>>> >>>>> ************************************ >>>>> TELL US ABOUT YOUR SUCCESS STORIES >>>>> http://www.scinethpc.ca/testimonials >>>>> ************************************ >>>>> --- >>>>> Jaime Pinto >>>>> SciNet HPC Consortium - Compute/Calcul Canada >>>>> www.scinet.utoronto.ca - www.computecanada.ca >>>>> University of Toronto >>>>> 661 University Ave. (MaRS), Suite 1140 >>>>> Toronto, ON, M5G1M1 >>>>> P: 416-978-2755 >>>>> C: 416-505-1477 >>>>> >>>>> ---------------------------------------------------------------- >>>>> This message was sent using IMP at SciNet Consortium, University of >>>>> Toronto. >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>>> Oy IBM Finland Ab >>>>> PL 265, 00101 Helsinki, Finland >>>>> Business ID, Y-tunnus: 0195876-3 >>>>> Registered in Finland >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>> Oy IBM Finland Ab >>> PL 265, 00101 Helsinki, Finland >>> Business ID, Y-tunnus: 0195876-3 >>> Registered in Finland >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From jtucker at pixitmedia.com Thu May 18 20:32:54 2017 From: jtucker at pixitmedia.com (Jez Tucker) Date: Thu, 18 May 2017 20:32:54 +0100 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> <20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Message-ID: Hi When mmbackup has passed the preflight stage (pretty quickly) you'll find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* Best, Jez On 18/05/17 20:02, Jaime Pinto wrote: > Ok Mark > > I'll follow your option 2) suggestion, and capture what mmbackup is > using as a rule first, then modify it. > > I imagine by 'capture' you are referring to the -L n level I use? > > -L n > Controls the level of information displayed by the > mmbackup command. Larger values indicate the > display of more detailed information. n should be one of > the following values: > > 3 > Displays the same information as 2, plus each > candidate file and the applicable rule. > > 4 > Displays the same information as 3, plus each > explicitly EXCLUDEed or LISTed > file, and the applicable rule. > > 5 > Displays the same information as 4, plus the > attributes of candidate and EXCLUDEed or > LISTed files. > > 6 > Displays the same information as 5, plus > non-candidate files and their attributes. > > Thanks > Jaime > > > > > Quoting "Marc A Kaplan" : > >> 1. As I surmised, and I now have verification from Mr. mmbackup, >> mmbackup >> wants to support incremental backups (using what it calls its shadow >> database) and keep both your sanity and its sanity -- so mmbackup limits >> you to either full filesystem or full inode-space (independent fileset.) >> If you want to do something else, okay, but you have to be careful >> and be >> sure of yourself. IBM will not be able to jump in and help you if and >> when >> it comes time to restore and you discover that your backup(s) were not >> complete. >> >> 2. If you decide you're a big boy (or woman or XXX) and want to do some >> hacking ... Fine... But even then, I suggest you do the smallest hack >> that will mostly achieve your goal... >> DO NOT think you can create a custom policy rules list for mmbackup >> out of >> thin air.... Capture the rules mmbackup creates and make small >> changes to >> that -- >> And as with any disaster recovery plan..... Plan your Test and Test >> your >> Plan.... Then do some dry run recoveries before you really "need" to >> do a >> real recovery. >> >> I only even sugest this because Jaime says he has a huge filesystem with >> several dependent filesets and he really, really wants to do a partial >> backup, without first copying or re-organizing the filesets. >> >> HMMM.... otoh... if you have one or more dependent filesets that are >> smallish, and/or you don't need the backups -- create independent >> filesets, copy/move/delete the data, rename, voila. >> >> >> >> From: "Jaime Pinto" >> To: "Marc A Kaplan" >> Cc: "gpfsug main discussion list" >> Date: 05/18/2017 12:36 PM >> Subject: Re: [gpfsug-discuss] What is an independent fileset? >> was: >> mmbackup with fileset : scope errors >> >> >> >> Marc >> >> The -P option may be a very good workaround, but I still have to test >> it. >> >> I'm currently trying to craft the mm rule, as minimalist as possible, >> however I'm not sure about what attributes mmbackup expects to see. >> >> Below is my first attempt. It would be nice to get comments from >> somebody familiar with the inner works of mmbackup. >> >> Thanks >> Jaime >> >> >> /* A macro to abbreviate VARCHAR */ >> define([vc],[VARCHAR($1)]) >> >> /* Define three external lists */ >> RULE EXTERNAL LIST 'allfiles' EXEC >> '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' >> >> /* Generate a list of all files, directories, plus all other file >> system objects, >> like symlinks, named pipes, etc. Include the owner's id with each >> object and >> sort them by the owner's id */ >> >> RULE 'r1' LIST 'allfiles' >> DIRECTORIES_PLUS >> SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || >> vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) >> FROM POOL 'system' >> FOR FILESET('sysadmin3') >> >> /* Files in special filesets, such as those excluded, are never >> traversed >> */ >> RULE 'ExcSpecialFile' EXCLUDE >> FOR FILESET('scratch3','project3') >> >> >> >> >> >> Quoting "Marc A Kaplan" : >> >>> Jaime, >>> >>> While we're waiting for the mmbackup expert to weigh in, notice that >> the >>> mmbackup command does have a -P option that allows you to provide a >>> customized policy rules file. >>> >>> So... a fairly safe hack is to do a trial mmbackup run, capture the >>> automatically generated policy file, and then augment it with FOR >>> FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup >> for >>> real with your customized policy file. >>> >>> mmbackup uses mmapplypolicy which by itself is happy to limit its >>> directory scan to a particular fileset by using >>> >>> mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope >>> fileset .... >>> >>> However, mmbackup probably has other worries and for simpliciity and >>> helping make sure you get complete, sensible backups, apparently has >>> imposed some restrictions to preserve sanity (yours and our support >> team! >>> ;-) ) ... (For example, suppose you were doing incremental backups, >>> starting at different paths each time? -- happy to do so, but when >>> disaster strikes and you want to restore -- you'll end up confused >> and/or >>> unhappy!) >>> >>> "converting from one fileset to another" --- sorry there is no such >> thing. >>> Filesets are kinda like little filesystems within filesystems. Moving >> a >>> file from one fileset to another requires a copy operation. There is >> no >>> fast move nor hardlinking. >>> >>> --marc >>> >>> >>> >>> From: "Jaime Pinto" >>> To: "gpfsug main discussion list" >> , >>> "Marc A Kaplan" >>> Date: 05/18/2017 09:58 AM >>> Subject: Re: [gpfsug-discuss] What is an independent fileset? >> was: >>> mmbackup with fileset : scope errors >>> >>> >>> >>> Thanks for the explanation Mark and Luis, >>> >>> It begs the question: why filesets are created as dependent by >>> default, if the adverse repercussions can be so great afterward? Even >>> in my case, where I manage GPFS and TSM deployments (and I have been >>> around for a while), didn't realize at all that not adding and extra >>> option at fileset creation time would cause me huge trouble with >>> scaling later on as I try to use mmbackup. >>> >>> When you have different groups to manage file systems and backups that >>> don't read each-other's manuals ahead of time then we have a really >>> bad recipe. >>> >>> I'm looking forward to your explanation as to why mmbackup cares one >>> way or another. >>> >>> I'm also hoping for a hint as to how to configure backup exclusion >>> rules on the TSM side to exclude fileset traversing on the GPFS side. >>> Is mmbackup smart enough (actually smarter than TSM client itself) to >>> read the exclusion rules on the TSM configuration and apply them >>> before traversing? >>> >>> Thanks >>> Jaime >>> >>> Quoting "Marc A Kaplan" : >>> >>>> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always >>> think >>>> and try to read that as "inode space". >>>> >>>> An "independent fileset" has all the attributes of an >>>> (older-fashioned) >>>> dependent fileset PLUS all of its files are represented by inodes that >>> are >>>> in a separable range of inode numbers - this allows GPFS to >>>> efficiently >>> do >>>> snapshots of just that inode-space (uh... independent fileset)... >>>> >>>> And... of course the files of dependent filesets must also be >>> represented >>>> by inodes -- those inode numbers are within the inode-space of >>>> whatever >>>> the containing independent fileset is... as was chosen when you >>>> created >>>> the fileset.... If you didn't say otherwise, inodes come from the >>>> default "root" fileset.... >>>> >>>> Clear as your bath-water, no? >>>> >>>> So why does mmbackup care one way or another ??? Stay tuned.... >>>> >>>> BTW - if you look at the bits of the inode numbers carefully --- you >> may >>>> not immediately discern what I mean by a "separable range of inode >>>> numbers" -- (very technical hint) you may need to permute the bit >>>> order >>>> before you discern a simple pattern... >>>> >>>> >>>> >>>> From: "Luis Bolinches" >>>> To: gpfsug-discuss at spectrumscale.org >>>> Cc: gpfsug-discuss at spectrumscale.org >>>> Date: 05/18/2017 02:10 AM >>>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope >>> errors >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> Hi >>>> >>>> There is no direct way to convert the one fileset that is dependent to >>>> independent or viceversa. >>>> >>>> I would suggest to take a look to chapter 5 of the 2014 redbook, lots >> of >>>> definitions about GPFS ILM including filesets >>>> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the >> only >>>> place that is explained but I honestly believe is a good single start >>>> point. It also needs an update as does nto have anything on CES nor >> ESS, >>>> so anyone in this list feel free to give feedback on that page people >>> with >>>> funding decisions listen there. >>>> >>>> So you are limited to either migrate the data from that fileset to a >> new >>>> independent fileset (multiple ways to do that) or use the TSM client >>>> config. >>>> >>>> ----- Original message ----- >>>> From: "Jaime Pinto" >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> To: "gpfsug main discussion list" , >>>> "Jaime Pinto" >>>> Cc: >>>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>>> Date: Thu, May 18, 2017 4:43 AM >>>> >>>> There is hope. See reference link below: >>>> >>> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm >> >> >>> >>>> >>>> >>>> The issue has to do with dependent vs. independent filesets, something >>>> I didn't even realize existed until now. Our filesets are dependent >>>> (for no particular reason), so I have to find a way to turn them into >>>> independent. >>>> >>>> The proper option syntax is "--scope inodespace", and the error >>>> message actually flagged that out, however I didn't know how to >>>> interpret what I saw: >>>> >>>> >>>> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>> -------------------------------------------------------- >>>> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >>>> 21:27:43 EDT 2017. >>>> -------------------------------------------------------- >>>> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >>>> fileset sysadmin3 is not supported >>>> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >>>> fileset level backup. exit 1 >>>> -------------------------------------------------------- >>>> >>>> Will post the outcome. >>>> Jaime >>>> >>>> >>>> >>>> Quoting "Jaime Pinto" : >>>> >>>>> Quoting "Luis Bolinches" : >>>>> >>>>>> Hi >>>>>> >>>>>> have you tried to add exceptions on the TSM client config file? >>>>> >>>>> Hey Luis, >>>>> >>>>> That would work as well (mechanically), however it's not elegant or >>>>> efficient. When you have over 1PB and 200M files on scratch it will >>>>> take many hours and several helper nodes to traverse that fileset >>>>> just >>>>> to be negated by TSM. In fact exclusion on TSM are just as >> inefficient. >>>>> Considering that I want to keep project and sysadmin on different >>>>> domains then it's much worst, since we have to traverse and exclude >>>>> scratch & (project|sysadmin) twice, once to capture sysadmin and >>>>> again >>>>> to capture project. >>>>> >>>>> If I have to use exclusion rules it has to rely sole on gpfs rules, >> and >>>>> somehow not traverse scratch at all. >>>>> >>>>> I suspect there is a way to do this properly, however the examples on >>>>> the gpfs guide and other references are not exhaustive. They only >>>>> show >>>>> a couple of trivial cases. >>>>> >>>>> However my situation is not unique. I suspect there are may >>>>> facilities >>>>> having to deal with backup of HUGE filesets. >>>>> >>>>> So the search is on. >>>>> >>>>> Thanks >>>>> Jaime >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >>>> linked >>>>>> on /IBM/GPFS/FSET1 >>>>>> >>>>>> dsm.sys >>>>>> ... >>>>>> >>>>>> DOMAIN /IBM/GPFS >>>>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>>>> >>>>>> >>>>>> From: "Jaime Pinto" >>>>>> To: "gpfsug main discussion list" >>>> >>>>>> Date: 17-05-17 23:44 >>>>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope >>>>>> errors >>>>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>>>> >>>>>> >>>>>> >>>>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>>>> * project3 >>>>>> * scratch3 >>>>>> * sysadmin3 >>>>>> >>>>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>>>> have no need or space to include *scratch3* on TSM. >>>>>> >>>>>> Question: how to craft the mmbackup command to backup >>>>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>>>> >>>>>> Below are 3 types of errors: >>>>>> >>>>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>>> --tsm-errorlog $logfile -L 2 >>>>>> >>>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope >>>>>> filesystem >>>>>> cannot be specified at the same time. >>>>>> >>>>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>>>> >>>>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>>>> dependent fileset sysadmin3 is not supported >>>>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>>>> fileset level backup. exit 1 >>>>>> >>>>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>>>> >>>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope >>>>>> filesystem >>>>>> cannot be specified at the same time. >>>>>> >>>>>> These examples don't really cover my case: >>>>>> >>>> >>> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >> >> >>> >>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> Jaime >>>>>> >>>>>> >>>>>> ************************************ >>>>>> TELL US ABOUT YOUR SUCCESS STORIES >>>>>> http://www.scinethpc.ca/testimonials >>>>>> ************************************ >>>>>> --- >>>>>> Jaime Pinto >>>>>> SciNet HPC Consortium - Compute/Calcul Canada >>>>>> www.scinet.utoronto.ca - www.computecanada.ca >>>>>> University of Toronto >>>>>> 661 University Ave. (MaRS), Suite 1140 >>>>>> Toronto, ON, M5G1M1 >>>>>> P: 416-978-2755 >>>>>> C: 416-505-1477 >>>>>> >>>>>> ---------------------------------------------------------------- >>>>>> This message was sent using IMP at SciNet Consortium, University of >>>>>> Toronto. >>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss at spectrumscale.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>>>> Oy IBM Finland Ab >>>>>> PL 265, 00101 Helsinki, Finland >>>>>> Business ID, Y-tunnus: 0195876-3 >>>>>> Registered in Finland >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ************************************ >>>>> TELL US ABOUT YOUR SUCCESS STORIES >>>>> http://www.scinethpc.ca/testimonials >>>>> ************************************ >>>>> --- >>>>> Jaime Pinto >>>>> SciNet HPC Consortium - Compute/Calcul Canada >>>>> www.scinet.utoronto.ca - www.computecanada.ca >>>>> University of Toronto >>>>> 661 University Ave. (MaRS), Suite 1140 >>>>> Toronto, ON, M5G1M1 >>>>> P: 416-978-2755 >>>>> C: 416-505-1477 >>>>> >>>>> ---------------------------------------------------------------- >>>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>> Oy IBM Finland Ab >>>> PL 265, 00101 Helsinki, Finland >>>> Business ID, Y-tunnus: 0195876-3 >>>> Registered in Finland >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfadden at us.ibm.com Thu May 18 22:46:49 2017 From: sfadden at us.ibm.com (Scott Fadden) Date: Thu, 18 May 2017 21:46:49 +0000 Subject: [gpfsug-discuss] Introduction In-Reply-To: Message-ID: Welcome! On May 17, 2017, 4:27:15 AM, neil.wilson at metoffice.gov.uk wrote: From: neil.wilson at metoffice.gov.uk To: gpfsug-discuss at spectrumscale.org Cc: Date: May 17, 2017 4:27:15 AM Subject: [gpfsug-discuss] Introduction Hi All, I help to run a gpfs cluster at the Met Office, Exeter, UK. The cluster is running GPFS 4.2.2.2, it?s used with slurm for batch work - primarily for postprocessing weather and climate change model data generated from our HPC. We currently have 8 NSD nodes with approx 3PB of storage with 70+ client nodes. Kind Regards Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu May 18 22:55:34 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 18 May 2017 21:55:34 +0000 Subject: [gpfsug-discuss] RPM Packages Message-ID: Hi All, Normally we never use the install toolkit, but deploy GPFS from a config management tool. I see there are now RPMs such as gpfs.license.dm, are these actually required to be installed? Everything seems to work well without them, so just interested. Maybe the GUI uses them? Thanks Simon From makaplan at us.ibm.com Fri May 19 14:50:20 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 19 May 2017 09:50:20 -0400 Subject: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca><20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca><20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Message-ID: Easier than hacking mmbackup or writing/editing policy rules, mmbackup interprets your TSM INCLUDE/EXCLUDE configuration statements -- so that is a supported and recommended way of doing business... If that doesn't do it for your purposes... You're into some light hacking... So look inside the mmbackup and tsbackup33 scripts and you'll find some DEBUG variables that should allow for keeping work and temp files around ... including the generated policy rules. I'm calling this hacking "light", because I don't think you'll need to change the scripts, but just look around and see how you can use what's there to achieve your legitimate purposes. Even so, you will have crossed a line where IBM support is "informal" at best. From: Jez Tucker To: gpfsug-discuss at spectrumscale.org Date: 05/18/2017 03:33 PM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi When mmbackup has passed the preflight stage (pretty quickly) you'll find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* Best, Jez On 18/05/17 20:02, Jaime Pinto wrote: Ok Mark I'll follow your option 2) suggestion, and capture what mmbackup is using as a rule first, then modify it. I imagine by 'capture' you are referring to the -L n level I use? -L n Controls the level of information displayed by the mmbackup command. Larger values indicate the display of more detailed information. n should be one of the following values: 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. Thanks Jaime Quoting "Marc A Kaplan" : 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup wants to support incremental backups (using what it calls its shadow database) and keep both your sanity and its sanity -- so mmbackup limits you to either full filesystem or full inode-space (independent fileset.) If you want to do something else, okay, but you have to be careful and be sure of yourself. IBM will not be able to jump in and help you if and when it comes time to restore and you discover that your backup(s) were not complete. 2. If you decide you're a big boy (or woman or XXX) and want to do some hacking ... Fine... But even then, I suggest you do the smallest hack that will mostly achieve your goal... DO NOT think you can create a custom policy rules list for mmbackup out of thin air.... Capture the rules mmbackup creates and make small changes to that -- And as with any disaster recovery plan..... Plan your Test and Test your Plan.... Then do some dry run recoveries before you really "need" to do a real recovery. I only even sugest this because Jaime says he has a huge filesystem with several dependent filesets and he really, really wants to do a partial backup, without first copying or re-organizing the filesets. HMMM.... otoh... if you have one or more dependent filesets that are smallish, and/or you don't need the backups -- create independent filesets, copy/move/delete the data, rename, voila. From: "Jaime Pinto" To: "Marc A Kaplan" Cc: "gpfsug main discussion list" Date: 05/18/2017 12:36 PM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : Jaime, While we're waiting for the mmbackup expert to weigh in, notice that the mmbackup command does have a -P option that allows you to provide a customized policy rules file. So... a fairly safe hack is to do a trial mmbackup run, capture the automatically generated policy file, and then augment it with FOR FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup for real with your customized policy file. mmbackup uses mmapplypolicy which by itself is happy to limit its directory scan to a particular fileset by using mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope fileset .... However, mmbackup probably has other worries and for simpliciity and helping make sure you get complete, sensible backups, apparently has imposed some restrictions to preserve sanity (yours and our support team! ;-) ) ... (For example, suppose you were doing incremental backups, starting at different paths each time? -- happy to do so, but when disaster strikes and you want to restore -- you'll end up confused and/or unhappy!) "converting from one fileset to another" --- sorry there is no such thing. Filesets are kinda like little filesystems within filesystems. Moving a file from one fileset to another requires a copy operation. There is no fast move nor hardlinking. --marc From: "Jaime Pinto" To: "gpfsug main discussion list" , "Marc A Kaplan" Date: 05/18/2017 09:58 AM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset.... If you didn't say otherwise, inodes come from the default "root" fileset.... Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned.... BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers" -- (very technical hint) you may need to permute the bit order before you discern a simple pattern... From: "Luis Bolinches" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 05/18/2017 02:10 AM Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi There is no direct way to convert the one fileset that is dependent to independent or viceversa. I would suggest to take a look to chapter 5 of the 2014 redbook, lots of definitions about GPFS ILM including filesets http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only place that is explained but I honestly believe is a good single start point. It also needs an update as does nto have anything on CES nor ESS, so anyone in this list feel free to give feedback on that page people with funding decisions listen there. So you are limited to either migrate the data from that fileset to a new independent fileset (multiple ways to do that) or use the TSM client config. ----- Original message ----- From: "Jaime Pinto" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug main discussion list" , "Jaime Pinto" Cc: Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Date: Thu, May 18, 2017 4:43 AM There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 -------------------------------------------------------- mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. -------------------------------------------------------- Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 -------------------------------------------------------- Will post the outcome. Jaime Quoting "Jaime Pinto" : Quoting "Luis Bolinches" : Hi have you tried to add exceptions on the TSM client config file? Hey Luis, That would work as well (mechanically), however it's not elegant or efficient. When you have over 1PB and 200M files on scratch it will take many hours and several helper nodes to traverse that fileset just to be negated by TSM. In fact exclusion on TSM are just as inefficient. Considering that I want to keep project and sysadmin on different domains then it's much worst, since we have to traverse and exclude scratch & (project|sysadmin) twice, once to capture sysadmin and again to capture project. If I have to use exclusion rules it has to rely sole on gpfs rules, and somehow not traverse scratch at all. I suspect there is a way to do this properly, however the examples on the gpfs guide and other references are not exhaustive. They only show a couple of trivial cases. However my situation is not unique. I suspect there are may facilities having to deal with backup of HUGE filesets. So the search is on. Thanks Jaime Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked on /IBM/GPFS/FSET1 dsm.sys ... DOMAIN /IBM/GPFS EXCLUDE.DIR /IBM/GPFS/FSET1 From: "Jaime Pinto" To: "gpfsug main discussion list" Date: 17-05-17 23:44 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: * project3 * scratch3 * sysadmin3 I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we have no need or space to include *scratch3* on TSM. Question: how to craft the mmbackup command to backup /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? Below are 3 types of errors: 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up dependent fileset sysadmin3 is not supported Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope filesystem --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. These examples don't really cover my case: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples Thanks Jaime ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Jez Tucker Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Fri May 19 17:12:20 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Fri, 19 May 2017 16:12:20 +0000 Subject: [gpfsug-discuss] RPM Packages Message-ID: For what it?s worth, I have been running 4.2.3 DM for a few weeks in a test lab and didn?t have the gpfs.license.dm package installed and everything worked fine (GUI, CES, etc, etc). Here?s what rpm says about itself [root at node1 ~]# rpm -qpl gpfs.license.dm-4.2.3-0.x86_64.rpm /usr/lpp/mmfs /usr/lpp/mmfs/bin /usr/lpp/mmfs/properties/version/ibm.com_IBM_Spectrum_Scale_Data_Mgmt_Edition-4.2.3.swidtag This file seems to be some XML code with strings of numbers in it. Not sure what it does for you. Mark On 5/18/17, 4:55 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (IT Research Support)" wrote: Hi All, Normally we never use the install toolkit, but deploy GPFS from a config management tool. I see there are now RPMs such as gpfs.license.dm, are these actually required to be installed? Everything seems to work well without them, so just interested. Maybe the GUI uses them? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions From jonathon.anderson at colorado.edu Fri May 19 17:16:50 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Fri, 19 May 2017 16:16:50 +0000 Subject: [gpfsug-discuss] RPM Packages In-Reply-To: References: Message-ID: Data Management Edition optionally replaces the traditional GPFS licensing model with a per-terabyte licensing fee, rather than a per-socket licensing fee. https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS216-158 Presumably installing this RPM is how you tell GPFS which licensing model you?re using. ~jonathon On 5/19/17, 10:12 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mark Bush" wrote: For what it?s worth, I have been running 4.2.3 DM for a few weeks in a test lab and didn?t have the gpfs.license.dm package installed and everything worked fine (GUI, CES, etc, etc). Here?s what rpm says about itself [root at node1 ~]# rpm -qpl gpfs.license.dm-4.2.3-0.x86_64.rpm /usr/lpp/mmfs /usr/lpp/mmfs/bin /usr/lpp/mmfs/properties/version/ibm.com_IBM_Spectrum_Scale_Data_Mgmt_Edition-4.2.3.swidtag This file seems to be some XML code with strings of numbers in it. Not sure what it does for you. Mark On 5/18/17, 4:55 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (IT Research Support)" wrote: Hi All, Normally we never use the install toolkit, but deploy GPFS from a config management tool. I see there are now RPMs such as gpfs.license.dm, are these actually required to be installed? Everything seems to work well without them, so just interested. Maybe the GUI uses them? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Fri May 19 17:43:49 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Fri, 19 May 2017 16:43:49 +0000 Subject: [gpfsug-discuss] RPM Packages In-Reply-To: References: , Message-ID: Well, I installed it one node and it still claims that it's advanced licensed on the node (only after installing gpfs.adv of course). I know the license model for DME, but we've never installed the gpfs.license.standard packages before. I agree the XML string pro ably is used somewhere, just not clear if it's needed or not... My guess would be maybe the GUI uses it. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 19 May 2017 17:16 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RPM Packages Data Management Edition optionally replaces the traditional GPFS licensing model with a per-terabyte licensing fee, rather than a per-socket licensing fee. https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS216-158 Presumably installing this RPM is how you tell GPFS which licensing model you?re using. ~jonathon On 5/19/17, 10:12 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mark Bush" wrote: For what it?s worth, I have been running 4.2.3 DM for a few weeks in a test lab and didn?t have the gpfs.license.dm package installed and everything worked fine (GUI, CES, etc, etc). Here?s what rpm says about itself [root at node1 ~]# rpm -qpl gpfs.license.dm-4.2.3-0.x86_64.rpm /usr/lpp/mmfs /usr/lpp/mmfs/bin /usr/lpp/mmfs/properties/version/ibm.com_IBM_Spectrum_Scale_Data_Mgmt_Edition-4.2.3.swidtag This file seems to be some XML code with strings of numbers in it. Not sure what it does for you. Mark On 5/18/17, 4:55 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (IT Research Support)" wrote: Hi All, Normally we never use the install toolkit, but deploy GPFS from a config management tool. I see there are now RPMs such as gpfs.license.dm, are these actually required to be installed? Everything seems to work well without them, so just interested. Maybe the GUI uses them? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From tpathare at sidra.org Sun May 21 09:40:42 2017 From: tpathare at sidra.org (Tushar Pathare) Date: Sun, 21 May 2017 08:40:42 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue Message-ID: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sun May 21 09:59:38 2017 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Sun, 21 May 2017 08:59:38 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> Message-ID: Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that. Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log). -Aaron On May 21, 2017 at 04:41:00 EDT, Tushar Pathare wrote: Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpathare at sidra.org Sun May 21 10:18:11 2017 From: tpathare at sidra.org (Tushar Pathare) Date: Sun, 21 May 2017 09:18:11 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> Message-ID: <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: on behalf of "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" Reply-To: gpfsug main discussion list Date: Sunday, May 21, 2017 at 11:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that. Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log). -Aaron On May 21, 2017 at 04:41:00 EDT, Tushar Pathare wrote: Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpathare at sidra.org Sun May 21 10:19:23 2017 From: tpathare at sidra.org (Tushar Pathare) Date: Sun, 21 May 2017 09:19:23 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> Message-ID: <31425940-A928-455A-83DD-DE2F9F387AC6@sidra.org> Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 112.11.11.11 ( sidra.snode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 136 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: Tushar Pathare Date: Sunday, May 21, 2017 at 12:18 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: on behalf of "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" Reply-To: gpfsug main discussion list Date: Sunday, May 21, 2017 at 11:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that. Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log). -Aaron On May 21, 2017 at 04:41:00 EDT, Tushar Pathare wrote: Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Sun May 21 15:36:02 2017 From: oehmes at gmail.com (Sven Oehme) Date: Sun, 21 May 2017 14:36:02 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: <31425940-A928-455A-83DD-DE2F9F387AC6@sidra.org> References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> <31425940-A928-455A-83DD-DE2F9F387AC6@sidra.org> Message-ID: The reason is the default setting of : verbsRdmasPerConnection : 16 you can increase this , on smaller clusters i run on some with 1024, but its not advised to run this on 100's of nodes and not if you know exactly what you are doing. i would start by doubling it to 32 and see how much of the waiters disappear, then go to 64 if you still see too many. don't go beyond 128 unless somebody knowledgeable reviewed your config further going to 32 or 64 is very low risk if you already run with verbs send enabled and don't have issues. On Sun, May 21, 2017 at 2:19 AM Tushar Pathare wrote: > Hello Aaron, > > Yes we saw recently an issue with > > > > VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 > (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 > > And > > > > VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 112.11.11.11 ( > sidra.snode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 136 > > > > Thanks > > > > *Tushar B Pathare MBA IT,BE IT* > > Bigdata & GPFS > > Software Development & Databases > > Scientific Computing > > Bioinformatics Division > > Research > > > > "What ever the mind of man can conceive and believe, drill can query" > > > > *Sidra Medical and Research Centre* > > *Sidra OPC Building* > > Sidra Medical & Research Center > > PO Box 26999 > > Al Luqta Street > > Education City North Campus > > ?Qatar Foundation, Doha, Qatar > > Office 4003 3333 ext 37443 | M +974 74793547 <+974%207479%203547> > > tpathare at sidra.org | www.sidra.org > > > > > > *From: *Tushar Pathare > *Date: *Sunday, May 21, 2017 at 12:18 PM > > > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] VERBS RDMA issue > > > > Hello Aaron, > > Yes we saw recently an issue with > > > > VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 > (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 > > And > > > > > > > > > > *Tushar B Pathare MBA IT,BE IT* > > Bigdata & GPFS > > Software Development & Databases > > Scientific Computing > > Bioinformatics Division > > Research > > > > "What ever the mind of man can conceive and believe, drill can query" > > > > *Sidra Medical and Research Centre* > > *Sidra OPC Building* > > Sidra Medical & Research Center > > PO Box 26999 > > Al Luqta Street > > Education City North Campus > > ?Qatar Foundation, Doha, Qatar > > Office 4003 3333 ext 37443 | M +974 74793547 <+974%207479%203547> > > tpathare at sidra.org | www.sidra.org > > > > > > *From: * on behalf of "Knister, > Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > *Reply-To: *gpfsug main discussion list > *Date: *Sunday, May 21, 2017 at 11:59 AM > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] VERBS RDMA issue > > > > Hi Tushar, > > > > For me the issue was an underlying performance bottleneck (some CPU > frequency scaling problems causing cores to throttle back when it wasn't > appropriate). > > > > I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the > past to turn this off under certain conditions although I don't remember > what those where. Hopefully others can chime in and qualify that. > > > > > Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the > mmfs.log). > > > > > -Aaron > > > > > > On May 21, 2017 at 04:41:00 EDT, Tushar Pathare > wrote: > > Hello Team, > > > > We are facing a lot of messages waiters related to *waiting for conn > rdmas < conn maxrdmas > * > > > > Is there some recommended settings to resolve this issue.? > > Our config for RDMA is as follows for 140 nodes(32 cores each) > > > > > > VERBS RDMA Configuration: > > Status : started > > Start time : Thu > > Stats reset time : Thu > > Dump time : Sun > > mmfs verbsRdma : enable > > mmfs verbsRdmaCm : disable > > mmfs verbsPorts : mlx4_0/1 mlx4_0/2 > > mmfs verbsRdmasPerNode : 3200 > > mmfs verbsRdmasPerNode (max) : 3200 > > mmfs verbsRdmasPerNodeOptimize : yes > > mmfs verbsRdmasPerConnection : 16 > > mmfs verbsRdmasPerConnection (max) : 16 > > mmfs verbsRdmaMinBytes : 16384 > > mmfs verbsRdmaRoCEToS : -1 > > mmfs verbsRdmaQpRtrMinRnrTimer : 18 > > mmfs verbsRdmaQpRtrPathMtu : 2048 > > mmfs verbsRdmaQpRtrSl : 0 > > mmfs verbsRdmaQpRtrSlDynamic : no > > mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 > > mmfs verbsRdmaQpRtsRnrRetry : 6 > > mmfs verbsRdmaQpRtsRetryCnt : 6 > > mmfs verbsRdmaQpRtsTimeout : 18 > > mmfs verbsRdmaMaxSendBytes : 16777216 > > mmfs verbsRdmaMaxSendSge : 27 > > mmfs verbsRdmaSend : yes > > mmfs verbsRdmaSerializeRecv : no > > mmfs verbsRdmaSerializeSend : no > > mmfs verbsRdmaUseMultiCqThreads : yes > > mmfs verbsSendBufferMemoryMB : 1024 > > mmfs verbsLibName : libibverbs.so > > mmfs verbsRdmaCmLibName : librdmacm.so > > mmfs verbsRdmaMaxReconnectInterval : 60 > > mmfs verbsRdmaMaxReconnectRetries : -1 > > mmfs verbsRdmaReconnectAction : disable > > mmfs verbsRdmaReconnectThreads : 32 > > mmfs verbsHungRdmaTimeout : 90 > > ibv_fork_support : true > > Max connections : 196608 > > Max RDMA size : 16777216 > > Target number of vsend buffs : 16384 > > Initial vsend buffs per conn : 59 > > nQPs : 140 > > nCQs : 282 > > nCMIDs : 0 > > nDtoThreads : 2 > > nextIndex : 141 > > Number of Devices opened : 1 > > Device : mlx4_0 > > vendor_id : 713 > > Device vendor_part_id : 4099 > > Device mem register chunk : 8589934592 <(858)%20993-4592> > (0x200000000) > > Device max_sge : 32 > > Adjusted max_sge : 0 > > Adjusted max_sge vsend : 30 > > Device max_qp_wr : 16351 > > Device max_qp_rd_atom : 16 > > Open Connect Ports : 1 > > verbsConnectPorts[0] : mlx4_0/1/0 > > lid : 129 > > state : IBV_PORT_ACTIVE > > path_mtu : 2048 > > interface ID : 0xe41d2d030073b9d1 > > sendChannel.ib_channel : 0x7FA6CB816200 > > sendChannel.dtoThreadP : 0x7FA6CB821870 > > sendChannel.dtoThreadId : 12540 > > sendChannel.nFreeCq : 1 > > recvChannel.ib_channel : 0x7FA6CB81D590 > > recvChannel.dtoThreadP : 0x7FA6CB822BA0 > > recvChannel.dtoThreadId : 12541 > > recvChannel.nFreeCq : 1 > > ibv_cq : 0x7FA2724C81F8 > > ibv_cq.cqP : 0x0 > > ibv_cq.nEvents : 0 > > ibv_cq.contextP : 0x0 > > ibv_cq.ib_channel : 0x0 > > > > Thanks > > > > > > *Tushar B Pathare MBA IT,BE IT* > > Bigdata & GPFS > > Software Development & Databases > > Scientific Computing > > Bioinformatics Division > > Research > > > > "What ever the mind of man can conceive and believe, drill can query" > > > > *Sidra Medical and Research Centre* > > *Sidra OPC Building* > > Sidra Medical & Research Center > > PO Box 26999 > > Al Luqta Street > > Education City North Campus > > ?Qatar Foundation, Doha, Qatar > > Office 4003 3333 ext 37443 | M +974 74793547 <+974%207479%203547> > > tpathare at sidra.org | www.sidra.org > > > > Disclaimer: This email and its attachments may be confidential and are > intended solely for the use of the individual to whom it is addressed. If > you are not the intended recipient, any reading, printing, storage, > disclosure, copying or any other action taken in respect of this e-mail is > prohibited and may be unlawful. If you are not the intended recipient, > please notify the sender immediately by using the reply function and then > permanently delete what you have received. Any views or opinions expressed > are solely those of the author and do not necessarily represent those of > Sidra Medical and Research Center. > > Disclaimer: This email and its attachments may be confidential and are > intended solely for the use of the individual to whom it is addressed. If > you are not the intended recipient, any reading, printing, storage, > disclosure, copying or any other action taken in respect of this e-mail is > prohibited and may be unlawful. If you are not the intended recipient, > please notify the sender immediately by using the reply function and then > permanently delete what you have received. Any views or opinions expressed > are solely those of the author and do not necessarily represent those of > Sidra Medical and Research Center. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpathare at sidra.org Sun May 21 16:56:40 2017 From: tpathare at sidra.org (Tushar Pathare) Date: Sun, 21 May 2017 15:56:40 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> <31425940-A928-455A-83DD-DE2F9F387AC6@sidra.org> Message-ID: Thanks Sven. Will read more about it and discuss with the team to come to a conclusion Thank you for pointing out the param. Will let you know the results after the tuning. Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Sunday, May 21, 2017 at 5:36 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] VERBS RDMA issue The reason is the default setting of : verbsRdmasPerConnection : 16 you can increase this , on smaller clusters i run on some with 1024, but its not advised to run this on 100's of nodes and not if you know exactly what you are doing. i would start by doubling it to 32 and see how much of the waiters disappear, then go to 64 if you still see too many. don't go beyond 128 unless somebody knowledgeable reviewed your config further going to 32 or 64 is very low risk if you already run with verbs send enabled and don't have issues. On Sun, May 21, 2017 at 2:19 AM Tushar Pathare > wrote: Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 112.11.11.11 ( sidra.snode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 136 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: Tushar Pathare > Date: Sunday, May 21, 2017 at 12:18 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: > on behalf of "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > Reply-To: gpfsug main discussion list > Date: Sunday, May 21, 2017 at 11:59 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that. Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log). -Aaron On May 21, 2017 at 04:41:00 EDT, Tushar Pathare > wrote: Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed May 24 10:43:37 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Wed, 24 May 2017 09:43:37 +0000 Subject: [gpfsug-discuss] Report on Scale and Cloud Message-ID: Hi All, I forgot that I never circulated, as part of the RCUK Working Group on Cloud, we produced a report on using Scale with Cloud/Undercloud ... You can download the report from: https://cloud.ac.uk/reports/spectrumscale/ We had some input from various IBM people whilst writing, and bear in mind that its a snapshot of support at the point in time when it was written. Simon From kkr at lbl.gov Wed May 24 20:57:49 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 24 May 2017 12:57:49 -0700 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working Message-ID: Hello, We have been experimenting with Zimon and the SS GUI on our dev cluster under 4.2.3. Things work well with one collector, but I'm running into issues when trying to use symmetric collector peers, i.e. federation. hostA and hostB are setup as both collectors and sensors with each a collector peer for the other. When this is done I can use mmperfmon to query hostA from hostA or hostB and vice versa. However, with this federation setup, the GUI fails to show data. The GUI is running on hostB. >From the collector candidate pool, hostA has been selected (automatically, not manually) as can be seen in the sensor configuration file. The GUI is unable to load data (just shows "Loading" on the graph), *unless* I change the setting of the ZIMonAddress variable in /usr/lpp/mmfs/gui/conf/gpfsgui.properties from localhost to hostA explicitly, it does not work if I change it to hostB explicitly. The GUI also works fine if I remove the peer entries altogether and just have one collector. I thought that federation meant that no matter which collector was queried the data would be returned. This appears to work for mmperfmon, but not the GUI. Can anyone advise? I also don't like the idea of having a pool of collector candidates and hard-coding one into the GUI configuration. I am including some output below to show the configs and query results. Thanks, Kristy The peers are added into the ZIMonCollector.cfg using the default port 9085: peers = { host = "hostA" port = "9085" }, { host = "hostB" port = "9085" } And the nodes are added as collector candidates, on hostA and hostB you see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors. cfg: colCandidates = "hostA.nersc.gov ", " hostB.nersc.gov " colRedundancy = 1 collectors = { host = "hostA.nersc.gov " port = "4739" } Showing the config with mmperfmon config show: colCandidates = "hostA.nersc.gov ", " hostB.nersc.gov " colRedundancy = 1 collectors = { host = "" Using mmperfmon I can query either host. [root at hostA ~]# mmperfmon query cpu -N hostB Legend: 1: hostB.nersc.gov |CPU|cpu_system 2: hostB.nersc.gov |CPU|cpu_user 3: hostB.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:03:54 0.54 3.67 4961 2 2017-05-23-17:03:55 0.63 3.55 6199 3 2017-05-23-17:03:56 1.59 3.76 7914 4 2017-05-23-17:03:57 1.38 5.34 5393 5 2017-05-23-17:03:58 0.54 2.21 2435 6 2017-05-23-17:03:59 0.13 0.29 2519 7 2017-05-23-17:04:00 0.13 0.25 2197 8 2017-05-23-17:04:01 0.13 0.29 2473 9 2017-05-23-17:04:02 0.08 0.21 2336 10 2017-05-23-17:04:03 0.13 0.21 2312 [root@ hostB ~]# mmperfmon query cpu -N hostB Legend: 1: hostB.nersc.gov |CPU|cpu_system 2: hostB.nersc.gov |CPU|cpu_user 3: hostB.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:04:07 0.13 0.21 2010 2 2017-05-23-17:04:08 0.04 0.21 2571 3 2017-05-23-17:04:09 0.08 0.25 2766 4 2017-05-23-17:04:10 0.13 0.29 3147 5 2017-05-23-17:04:11 0.83 0.83 2596 6 2017-05-23-17:04:12 0.33 0.54 2530 7 2017-05-23-17:04:13 0.08 0.33 2428 8 2017-05-23-17:04:14 0.13 0.25 2326 9 2017-05-23-17:04:15 0.13 0.29 4190 10 2017-05-23-17:04:16 0.58 1.92 5882 [root@ hostB ~]# mmperfmon query cpu -N hostA Legend: 1: hostA.nersc.gov |CPU|cpu_system 2: hostA.nersc.gov |CPU|cpu_user 3: hostA.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:05:45 0.33 0.46 7460 2 2017-05-23-17:05:46 0.33 0.42 8993 3 2017-05-23-17:05:47 0.42 0.54 8709 4 2017-05-23-17:05:48 0.38 0.5 5923 5 2017-05-23-17:05:49 0.54 1.46 7381 6 2017-05-23-17:05:50 0.58 3.51 10381 7 2017-05-23-17:05:51 1.05 1.13 10995 8 2017-05-23-17:05:52 0.88 0.92 10855 9 2017-05-23-17:05:53 0.5 0.63 10958 10 2017-05-23-17:05:54 0.5 0.59 10285 [root@ hostA ~]# mmperfmon query cpu -N hostA Legend: 1: hostA.nersc.gov |CPU|cpu_system 2: hostA.nersc.gov |CPU|cpu_user 3: hostA.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:05:50 0.58 3.51 10381 2 2017-05-23-17:05:51 1.05 1.13 10995 3 2017-05-23-17:05:52 0.88 0.92 10855 4 2017-05-23-17:05:53 0.5 0.63 10958 5 2017-05-23-17:05:54 0.5 0.59 10285 6 2017-05-23-17:05:55 0.46 0.63 11621 7 2017-05-23-17:05:56 0.84 0.92 11477 8 2017-05-23-17:05:57 1.47 1.88 11084 9 2017-05-23-17:05:58 0.46 1.76 9125 10 2017-05-23-17:05:59 0.42 0.63 11745 -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylorm at us.ibm.com Thu May 25 14:46:06 2017 From: taylorm at us.ibm.com (Michael L Taylor) Date: Thu, 25 May 2017 06:46:06 -0700 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working In-Reply-To: References: Message-ID: Hi Kristy, At first glance your config looks ok. Here are a few things to check. Is 4.2.3 the first time you have installed and configured performance monitoring? Or have you configured it at some version < 4.2.3 and then upgraded to 4.2.3? Did you restart pmcollector after changing the configuration? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_guienableperfmon.htm "Configure peer configuration for the collectors. The collector configuration is stored in the /opt/IBM/zimon/ZIMonCollector.cfg file. This file defines collector peer configuration and the aggregation rules. If you are using only a single collector, you can skip this step. Restart the pmcollector service after making changes to the configuration file. The GUI must have access to all data from each GUI node. " Firewall ports are open for performance monitoring and MGMT GUI? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_firewallforgui.htm?cp=STXKQY https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_firewallforPMT.htm Did you setup the collectors with : prompt# mmperfmon config generate --collectors collector1.domain.com,collector2.domain.com,? Once the configuration file has been stored within IBM Spectrum Scale, it can be activated as follows. prompt# mmchnode --perfmon ?N nodeclass1,nodeclass2,? Perhaps once you make sure the federated mode is set between hostA and hostB as you like then 'systemctl restart pmcollector' and then 'systemctl restart gpfsgui' on both nodes? From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 05/24/2017 12:58 PM Subject: gpfsug-discuss Digest, Vol 64, Issue 61 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. SS Metrics (Zimon) and SS GUI, Federation not working (Kristy Kallback-Rose) ---------------------------------------------------------------------- Message: 1 Date: Wed, 24 May 2017 12:57:49 -0700 From: Kristy Kallback-Rose To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working Message-ID: Content-Type: text/plain; charset="utf-8" Hello, We have been experimenting with Zimon and the SS GUI on our dev cluster under 4.2.3. Things work well with one collector, but I'm running into issues when trying to use symmetric collector peers, i.e. federation. hostA and hostB are setup as both collectors and sensors with each a collector peer for the other. When this is done I can use mmperfmon to query hostA from hostA or hostB and vice versa. However, with this federation setup, the GUI fails to show data. The GUI is running on hostB. >From the collector candidate pool, hostA has been selected (automatically, not manually) as can be seen in the sensor configuration file. The GUI is unable to load data (just shows "Loading" on the graph), *unless* I change the setting of the ZIMonAddress variable in /usr/lpp/mmfs/gui/conf/gpfsgui.properties from localhost to hostA explicitly, it does not work if I change it to hostB explicitly. The GUI also works fine if I remove the peer entries altogether and just have one collector. I thought that federation meant that no matter which collector was queried the data would be returned. This appears to work for mmperfmon, but not the GUI. Can anyone advise? I also don't like the idea of having a pool of collector candidates and hard-coding one into the GUI configuration. I am including some output below to show the configs and query results. Thanks, Kristy The peers are added into the ZIMonCollector.cfg using the default port 9085: peers = { host = "hostA" port = "9085" }, { host = "hostB" port = "9085" } And the nodes are added as collector candidates, on hostA and hostB you see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors. cfg: colCandidates = "hostA.nersc.gov ", " hostB.nersc.gov " colRedundancy = 1 collectors = { host = "hostA.nersc.gov " port = "4739" } Showing the config with mmperfmon config show: colCandidates = "hostA.nersc.gov ", " hostB.nersc.gov " colRedundancy = 1 collectors = { host = "" Using mmperfmon I can query either host. [root at hostA ~]# mmperfmon query cpu -N hostB Legend: 1: hostB.nersc.gov |CPU|cpu_system 2: hostB.nersc.gov |CPU|cpu_user 3: hostB.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:03:54 0.54 3.67 4961 2 2017-05-23-17:03:55 0.63 3.55 6199 3 2017-05-23-17:03:56 1.59 3.76 7914 4 2017-05-23-17:03:57 1.38 5.34 5393 5 2017-05-23-17:03:58 0.54 2.21 2435 6 2017-05-23-17:03:59 0.13 0.29 2519 7 2017-05-23-17:04:00 0.13 0.25 2197 8 2017-05-23-17:04:01 0.13 0.29 2473 9 2017-05-23-17:04:02 0.08 0.21 2336 10 2017-05-23-17:04:03 0.13 0.21 2312 [root@ hostB ~]# mmperfmon query cpu -N hostB Legend: 1: hostB.nersc.gov |CPU|cpu_system 2: hostB.nersc.gov |CPU|cpu_user 3: hostB.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:04:07 0.13 0.21 2010 2 2017-05-23-17:04:08 0.04 0.21 2571 3 2017-05-23-17:04:09 0.08 0.25 2766 4 2017-05-23-17:04:10 0.13 0.29 3147 5 2017-05-23-17:04:11 0.83 0.83 2596 6 2017-05-23-17:04:12 0.33 0.54 2530 7 2017-05-23-17:04:13 0.08 0.33 2428 8 2017-05-23-17:04:14 0.13 0.25 2326 9 2017-05-23-17:04:15 0.13 0.29 4190 10 2017-05-23-17:04:16 0.58 1.92 5882 [root@ hostB ~]# mmperfmon query cpu -N hostA Legend: 1: hostA.nersc.gov |CPU|cpu_system 2: hostA.nersc.gov |CPU|cpu_user 3: hostA.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:05:45 0.33 0.46 7460 2 2017-05-23-17:05:46 0.33 0.42 8993 3 2017-05-23-17:05:47 0.42 0.54 8709 4 2017-05-23-17:05:48 0.38 0.5 5923 5 2017-05-23-17:05:49 0.54 1.46 7381 6 2017-05-23-17:05:50 0.58 3.51 10381 7 2017-05-23-17:05:51 1.05 1.13 10995 8 2017-05-23-17:05:52 0.88 0.92 10855 9 2017-05-23-17:05:53 0.5 0.63 10958 10 2017-05-23-17:05:54 0.5 0.59 10285 [root@ hostA ~]# mmperfmon query cpu -N hostA Legend: 1: hostA.nersc.gov |CPU|cpu_system 2: hostA.nersc.gov |CPU|cpu_user 3: hostA.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:05:50 0.58 3.51 10381 2 2017-05-23-17:05:51 1.05 1.13 10995 3 2017-05-23-17:05:52 0.88 0.92 10855 4 2017-05-23-17:05:53 0.5 0.63 10958 5 2017-05-23-17:05:54 0.5 0.59 10285 6 2017-05-23-17:05:55 0.46 0.63 11621 7 2017-05-23-17:05:56 0.84 0.92 11477 8 2017-05-23-17:05:57 1.47 1.88 11084 9 2017-05-23-17:05:58 0.46 1.76 9125 10 2017-05-23-17:05:59 0.42 0.63 11745 -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170524/e64509b9/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 64, Issue 61 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From NSCHULD at de.ibm.com Thu May 25 15:13:16 2017 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Thu, 25 May 2017 16:13:16 +0200 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working In-Reply-To: References: Message-ID: Hi, please upgrade to 4.2.3 ptf1 - the version before has an issue with federated queries in some situations. Mit freundlichen Gr??en / Kind regards Norbert Schuld From: Kristy Kallback-Rose To: gpfsug-discuss at spectrumscale.org Date: 24/05/2017 21:58 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, ? We have been experimenting with Zimon and the SS GUI on our dev cluster under 4.2.3. Things work well with one collector, but I'm running into issues when trying to use symmetric collector peers, i.e. federation. ? hostA and hostB are setup as both collectors and sensors with each a collector peer for the other. When this is done I can use mmperfmon to query hostA from hostA or hostB and vice versa. However, with this federation setup, the GUI fails to show data. The GUI is running on hostB. >From the collector candidate pool, hostA has been selected (automatically, not manually) as can be seen in the sensor configuration file. The GUI is unable to load data (just shows "Loading" on the graph), *unless* I change the setting of the?ZIMonAddress variable in?/usr/lpp/mmfs/gui/conf/gpfsgui.properties from localhost to hostA explicitly, it does not work if I change it to hostB explicitly. The GUI also works fine if I remove the peer entries altogether and just have one collector. ? I thought that federation meant that no matter which collector was queried the data would be returned. This appears to work for mmperfmon, but not the GUI. Can anyone advise? I also don't like the idea of having a pool of collector candidates and hard-coding one into the GUI configuration. I am including some output below to show the configs and query results. Thanks, Kristy ? The peers are added into the?ZIMonCollector.cfg using the default port 9085: ?peers = { ? ? ? ? host = "hostA" ? ? ? ? port = "9085" ?}, ?{ ? ? ? ? host = "hostB" ? ? ? ? port = "9085" ?} And the nodes are added as collector candidates, on hostA and hostB you see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors.cfg: colCandidates = "hostA.nersc.gov", "hostB.nersc.gov" colRedundancy = 1 collectors = { host = "hostA.nersc.gov" port = "4739" } Showing the config with mmperfmon config show: colCandidates = "hostA.nersc.gov", "hostB.nersc.gov" colRedundancy = 1 collectors = { host = "" Using mmperfmon I can query either host. [root at hostA ~]#? mmperfmon query cpu -N hostB Legend: ?1: hostB.nersc.gov|CPU|cpu_system ?2:?hostB.nersc.gov|CPU|cpu_user ?3:?hostB.nersc.gov|CPU|cpu_contexts Row ? ? ? ? ? Timestamp cpu_system cpu_user cpu_contexts ? 1 2017-05-23-17:03:54 ? ? ? 0.54 ? ? 3.67 ? ? ? ? 4961 ? 2 2017-05-23-17:03:55 ? ? ? 0.63 ? ? 3.55 ? ? ? ? 6199 ? 3 2017-05-23-17:03:56 ? ? ? 1.59 ? ? 3.76 ? ? ? ? 7914 ? 4 2017-05-23-17:03:57 ? ? ? 1.38 ? ? 5.34 ? ? ? ? 5393 ? 5 2017-05-23-17:03:58 ? ? ? 0.54 ? ? 2.21 ? ? ? ? 2435 ? 6 2017-05-23-17:03:59 ? ? ? 0.13 ? ? 0.29 ? ? ? ? 2519 ? 7 2017-05-23-17:04:00 ? ? ? 0.13 ? ? 0.25 ? ? ? ? 2197 ? 8 2017-05-23-17:04:01 ? ? ? 0.13 ? ? 0.29 ? ? ? ? 2473 ? 9 2017-05-23-17:04:02 ? ? ? 0.08 ? ? 0.21 ? ? ? ? 2336 ?10 2017-05-23-17:04:03 ? ? ? 0.13 ? ? 0.21 ? ? ? ? 2312 [root@?hostB?~]#? mmperfmon query cpu -N?hostB Legend: ?1:?hostB.nersc.gov|CPU|cpu_system ?2:?hostB.nersc.gov|CPU|cpu_user ?3:?hostB.nersc.gov|CPU|cpu_contexts Row ? ? ? ? ? Timestamp cpu_system cpu_user cpu_contexts ? 1 2017-05-23-17:04:07 ? ? ? 0.13 ? ? 0.21 ? ? ? ? 2010 ? 2 2017-05-23-17:04:08 ? ? ? 0.04 ? ? 0.21 ? ? ? ? 2571 ? 3 2017-05-23-17:04:09 ? ? ? 0.08 ? ? 0.25 ? ? ? ? 2766 ? 4 2017-05-23-17:04:10 ? ? ? 0.13 ? ? 0.29 ? ? ? ? 3147 ? 5 2017-05-23-17:04:11 ? ? ? 0.83 ? ? 0.83 ? ? ? ? 2596 ? 6 2017-05-23-17:04:12 ? ? ? 0.33 ? ? 0.54 ? ? ? ? 2530 ? 7 2017-05-23-17:04:13 ? ? ? 0.08 ? ? 0.33 ? ? ? ? 2428 ? 8 2017-05-23-17:04:14 ? ? ? 0.13 ? ? 0.25 ? ? ? ? 2326 ? 9 2017-05-23-17:04:15 ? ? ? 0.13 ? ? 0.29 ? ? ? ? 4190 ?10 2017-05-23-17:04:16 ? ? ? 0.58 ? ? 1.92 ? ? ? ? 5882 [root@?hostB?~]#? mmperfmon query cpu -N?hostA Legend: ?1:?hostA.nersc.gov|CPU|cpu_system ?2:?hostA.nersc.gov|CPU|cpu_user ?3:?hostA.nersc.gov|CPU|cpu_contexts Row ? ? ? ? ? Timestamp cpu_system cpu_user cpu_contexts ? 1 2017-05-23-17:05:45 ? ? ? 0.33 ? ? 0.46 ? ? ? ? 7460 ? 2 2017-05-23-17:05:46 ? ? ? 0.33 ? ? 0.42 ? ? ? ? 8993 ? 3 2017-05-23-17:05:47 ? ? ? 0.42 ? ? 0.54 ? ? ? ? 8709 ? 4 2017-05-23-17:05:48 ? ? ? 0.38? ? ? 0.5 ? ? ? ? 5923 ? 5 2017-05-23-17:05:49 ? ? ? 0.54 ? ? 1.46 ? ? ? ? 7381 ? 6 2017-05-23-17:05:50 ? ? ? 0.58 ? ? 3.51? ? ? ? 10381 ? 7 2017-05-23-17:05:51 ? ? ? 1.05 ? ? 1.13? ? ? ? 10995 ? 8 2017-05-23-17:05:52 ? ? ? 0.88 ? ? 0.92? ? ? ? 10855 ? 9 2017-05-23-17:05:53? ? ? ? 0.5 ? ? 0.63? ? ? ? 10958 ?10 2017-05-23-17:05:54? ? ? ? 0.5 ? ? 0.59? ? ? ? 10285 [root@?hostA?~]#? mmperfmon query cpu -N?hostA Legend: ?1:?hostA.nersc.gov|CPU|cpu_system ?2:?hostA.nersc.gov|CPU|cpu_user ?3:?hostA.nersc.gov|CPU|cpu_contexts Row ? ? ? ? ? Timestamp cpu_system cpu_user cpu_contexts ? 1 2017-05-23-17:05:50 ? ? ? 0.58 ? ? 3.51? ? ? ? 10381 ? 2 2017-05-23-17:05:51 ? ? ? 1.05 ? ? 1.13? ? ? ? 10995 ? 3 2017-05-23-17:05:52 ? ? ? 0.88 ? ? 0.92? ? ? ? 10855 ? 4 2017-05-23-17:05:53? ? ? ? 0.5 ? ? 0.63? ? ? ? 10958 ? 5 2017-05-23-17:05:54? ? ? ? 0.5 ? ? 0.59? ? ? ? 10285 ? 6 2017-05-23-17:05:55 ? ? ? 0.46 ? ? 0.63? ? ? ? 11621 ? 7 2017-05-23-17:05:56 ? ? ? 0.84 ? ? 0.92? ? ? ? 11477 ? 8 2017-05-23-17:05:57 ? ? ? 1.47 ? ? 1.88? ? ? ? 11084 ? 9 2017-05-23-17:05:58 ? ? ? 0.46 ? ? 1.76 ? ? ? ? 9125 ?10 2017-05-23-17:05:59 ? ? ? 0.42 ? ? 0.63? ? ? ? 11745 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kkr at lbl.gov Thu May 25 22:51:32 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 25 May 2017 14:51:32 -0700 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working In-Reply-To: References: Message-ID: Hi Michael, Norbert, Thanks for your replies, we did do all the setup as Michael described, and stop and restart services more than once ;-). I believe the issue is resolved with the PTF. I am still checking, but it seems to be working with symmetric peering between those two nodes. I will test further and expand to other nodes and make sure it continue to work. I will report back if I run into any other issues. Cheers, Kristy On Thu, May 25, 2017 at 6:46 AM, Michael L Taylor wrote: > Hi Kristy, > At first glance your config looks ok. Here are a few things to check. > > Is 4.2.3 the first time you have installed and configured performance > monitoring? Or have you configured it at some version < 4.2.3 and then > upgraded to 4.2.3? > > > Did you restart pmcollector after changing the configuration? > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/ > com.ibm.spectrum.scale.v4r23.doc/bl1adv_guienableperfmon.htm > "Configure peer configuration for the collectors. The collector > configuration is stored in the /opt/IBM/zimon/ZIMonCollector.cfg file. > This file defines collector peer configuration and the aggregation rules. > If you are using only a single collector, you can skip this step. Restart > the pmcollector service after making changes to the configuration file. The > GUI must have access to all data from each GUI node. " > > Firewall ports are open for performance monitoring and MGMT GUI? > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/ > com.ibm.spectrum.scale.v4r23.doc/bl1adv_firewallforgui.htm?cp=STXKQY > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/ > com.ibm.spectrum.scale.v4r23.doc/bl1adv_firewallforPMT.htm > > Did you setup the collectors with : > prompt# mmperfmon config generate --collectors collector1.domain.com, > collector2.domain.com,? > > Once the configuration file has been stored within IBM Spectrum Scale, it > can be activated as follows. > prompt# mmchnode --perfmon ?N nodeclass1,nodeclass2,? > > Perhaps once you make sure the federated mode is set between hostA and > hostB as you like then 'systemctl restart pmcollector' and then 'systemctl > restart gpfsgui' on both nodes? > > > > [image: Inactive hide details for gpfsug-discuss-request---05/24/2017 > 12:58:21 PM---Send gpfsug-discuss mailing list submissions to gp] > gpfsug-discuss-request---05/24/2017 12:58:21 PM---Send gpfsug-discuss > mailing list submissions to gpfsug-discuss at spectrumscale.org > > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 05/24/2017 12:58 PM > Subject: gpfsug-discuss Digest, Vol 64, Issue 61 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. SS Metrics (Zimon) and SS GUI, Federation not working > (Kristy Kallback-Rose) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 24 May 2017 12:57:49 -0700 > From: Kristy Kallback-Rose > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation > not working > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > > Hello, > > We have been experimenting with Zimon and the SS GUI on our dev cluster > under 4.2.3. Things work well with one collector, but I'm running into > issues when trying to use symmetric collector peers, i.e. federation. > > hostA and hostB are setup as both collectors and sensors with each a > collector peer for the other. When this is done I can use mmperfmon to > query hostA from hostA or hostB and vice versa. However, with this > federation setup, the GUI fails to show data. The GUI is running on hostB. > >From the collector candidate pool, hostA has been selected (automatically, > not manually) as can be seen in the sensor configuration file. The GUI is > unable to load data (just shows "Loading" on the graph), *unless* I change > the setting of the ZIMonAddress variable in > /usr/lpp/mmfs/gui/conf/gpfsgui.properties > from localhost to hostA explicitly, it does not work if I change it to > hostB explicitly. The GUI also works fine if I remove the peer entries > altogether and just have one collector. > > I thought that federation meant that no matter which collector was > queried the data would be returned. This appears to work for mmperfmon, but > not the GUI. Can anyone advise? I also don't like the idea of having a pool > of collector candidates and hard-coding one into the GUI configuration. I > am including some output below to show the configs and query results. > > Thanks, > > Kristy > > > The peers are added into the ZIMonCollector.cfg using the default port > 9085: > > peers = { > > host = "hostA" > > port = "9085" > > }, > > { > > host = "hostB" > > port = "9085" > > } > > > And the nodes are added as collector candidates, on hostA and hostB you > see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors. > cfg: > > colCandidates = "hostA.nersc.gov ", " > hostB.nersc.gov " > > colRedundancy = 1 > > collectors = { > > host = "hostA.nersc.gov " > > port = "4739" > > } > > > Showing the config with mmperfmon config show: > > colCandidates = "hostA.nersc.gov ", " > hostB.nersc.gov " > > colRedundancy = 1 > > collectors = { > > host = "" > > > Using mmperfmon I can query either host. > > > [root at hostA ~]# mmperfmon query cpu -N hostB > > > Legend: > > 1: hostB.nersc.gov |CPU|cpu_system > > 2: hostB.nersc.gov |CPU|cpu_user > > 3: hostB.nersc.gov |CPU|cpu_contexts > > > > Row Timestamp cpu_system cpu_user cpu_contexts > > 1 2017-05-23-17:03:54 0.54 3.67 4961 > > 2 2017-05-23-17:03:55 0.63 3.55 6199 > > 3 2017-05-23-17:03:56 1.59 3.76 7914 > > 4 2017-05-23-17:03:57 1.38 5.34 5393 > > 5 2017-05-23-17:03:58 0.54 2.21 2435 > > 6 2017-05-23-17:03:59 0.13 0.29 2519 > > 7 2017-05-23-17:04:00 0.13 0.25 2197 > > 8 2017-05-23-17:04:01 0.13 0.29 2473 > > 9 2017-05-23-17:04:02 0.08 0.21 2336 > > 10 2017-05-23-17:04:03 0.13 0.21 2312 > > > [root@ hostB ~]# mmperfmon query cpu -N hostB > > > Legend: > > 1: hostB.nersc.gov |CPU|cpu_system > > 2: hostB.nersc.gov |CPU|cpu_user > > 3: hostB.nersc.gov |CPU|cpu_contexts > > > > Row Timestamp cpu_system cpu_user cpu_contexts > > 1 2017-05-23-17:04:07 0.13 0.21 2010 > > 2 2017-05-23-17:04:08 0.04 0.21 2571 > > 3 2017-05-23-17:04:09 0.08 0.25 2766 > > 4 2017-05-23-17:04:10 0.13 0.29 3147 > > 5 2017-05-23-17:04:11 0.83 0.83 2596 > > 6 2017-05-23-17:04:12 0.33 0.54 2530 > > 7 2017-05-23-17:04:13 0.08 0.33 2428 > > 8 2017-05-23-17:04:14 0.13 0.25 2326 > > 9 2017-05-23-17:04:15 0.13 0.29 4190 > > 10 2017-05-23-17:04:16 0.58 1.92 5882 > > > [root@ hostB ~]# mmperfmon query cpu -N hostA > > > Legend: > > 1: hostA.nersc.gov |CPU|cpu_system > > 2: hostA.nersc.gov |CPU|cpu_user > > 3: hostA.nersc.gov |CPU|cpu_contexts > > > > Row Timestamp cpu_system cpu_user cpu_contexts > > 1 2017-05-23-17:05:45 0.33 0.46 7460 > > 2 2017-05-23-17:05:46 0.33 0.42 8993 > > 3 2017-05-23-17:05:47 0.42 0.54 8709 > > 4 2017-05-23-17:05:48 0.38 0.5 5923 > > 5 2017-05-23-17:05:49 0.54 1.46 7381 > > 6 2017-05-23-17:05:50 0.58 3.51 10381 > > 7 2017-05-23-17:05:51 1.05 1.13 10995 > > 8 2017-05-23-17:05:52 0.88 0.92 10855 > > 9 2017-05-23-17:05:53 0.5 0.63 10958 > > 10 2017-05-23-17:05:54 0.5 0.59 10285 > > > [root@ hostA ~]# mmperfmon query cpu -N hostA > > > Legend: > > 1: hostA.nersc.gov |CPU|cpu_system > > 2: hostA.nersc.gov |CPU|cpu_user > > 3: hostA.nersc.gov |CPU|cpu_contexts > > > > Row Timestamp cpu_system cpu_user cpu_contexts > > 1 2017-05-23-17:05:50 0.58 3.51 10381 > > 2 2017-05-23-17:05:51 1.05 1.13 10995 > > 3 2017-05-23-17:05:52 0.88 0.92 10855 > > 4 2017-05-23-17:05:53 0.5 0.63 10958 > > 5 2017-05-23-17:05:54 0.5 0.59 10285 > > 6 2017-05-23-17:05:55 0.46 0.63 11621 > > 7 2017-05-23-17:05:56 0.84 0.92 11477 > > 8 2017-05-23-17:05:57 1.47 1.88 11084 > > 9 2017-05-23-17:05:58 0.46 1.76 9125 > > 10 2017-05-23-17:05:59 0.42 0.63 11745 > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: 20170524/e64509b9/attachment.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 64, Issue 61 > ********************************************** > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon May 29 21:01:38 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 29 May 2017 16:01:38 -0400 Subject: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca><20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca><20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Message-ID: <20170529160138.18847jpj5x9kz8ki@support.scinet.utoronto.ca> Quoting "Marc A Kaplan" : > Easier than hacking mmbackup or writing/editing policy rules, > > mmbackup interprets > your TSM INCLUDE/EXCLUDE configuration statements -- so that is a > supported and recommended way of doing business... Finally got some time to resume testing on this Here is the syntax used (In this test I want to backup /wosgpfs/backmeup only) mmbackup /wosgpfs -N wos-gateway02-ib0 -s /dev/shm --tsm-errorlog $logfile -L 4 As far as I can tell, the EXCLUDE statements in the TSM configuration (dsm.opt) are being *ignored*. I tried a couple of formats: 1) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup" INCLExcl "/sysadmin/BA/ba-wos/bin/inclexcl" 1a) cat /sysadmin/BA/ba-wos/bin/inclexcl exclude.dir /wosgpfs/ignore /wosgpfs/junk /wosgpfs/project 1b) cat /sysadmin/BA/ba-wos/bin/inclexcl exclude.dir /wosgpfs/ignore exclude.dir /wosgpfs/junk exclude.dir /wosgpfs/project 2) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup" exclude.dir /wosgpfs/ignore exclude.dir /wosgpfs/junk exclude.dir /wosgpfs/project 3) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup -/wosgpfs/ignore -/wosgpfs/junk -/wosgpfs/project" In another words, all the contents under /wosgpfs are being traversed and going to the TSM backup. Furthermore, even with "-L 4" mmbackup is not logging the list of files being sent to the TSM backup anywhere on the client side. I only get that information from the TSM server side (get filespace). I know that all contents of /wosgpfs are being traversed because I have a tail on /wosgpfs/.mmbackupShadow.1.TAPENODE3.filesys.update > > If that doesn't do it for your purposes... You're into some light > hacking... So look inside the mmbackup and tsbackup33 scripts and you'll > find some DEBUG variables that should allow for keeping work and temp > files around ... including the generated policy rules. > I'm calling this hacking "light", because I don't think you'll need to > change the scripts, but just look around and see how you can use what's > there to achieve your legitimate purposes. Even so, you will have crossed > a line where IBM support is "informal" at best. On the other hand I am having better luck with the customer rules file. The modified template below will traverse only the /wosgpfs/backmeup, as intended, and only backup files modified under that path. I guess I have a working solution that I will try at scale now. [root at wos-gateway02 bin]# cat dsm.opt SERVERNAME TAPENODE3 ARCHSYMLINKASFILE NO DOMAIN "/wosgpfs/backmeup" __________________________________________________________ /* Auto-generated GPFS policy rules file * Generated on Wed May 24 12:12:51 2017 */ /* Server rules for backup server 1 *** TAPENODE3 *** */ RULE EXTERNAL LIST 'mmbackup.1.TAPENODE3' EXEC '/wosgpfs/.mmbackupCfg/BAexecScript.wosgpfs' OPTS '"/wosgpfs/.mmbackupShadow.1.TAPENODE3.filesys.update" "-servername=TAPENODE3" "-auditlogname=/wosgpfs/mmbackup.audit.wosgpfs.TAPENODE3" "NONE"' RULE 'BackupRule' LIST 'mmbackup.1.TAPENODE3' DIRECTORIES_PLUS SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' ELSE 'resdnt' END )) WHERE ( NOT ( (PATH_NAME LIKE '/%/.mmbackup%') OR (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.mmLockDir/%') OR (PATH_NAME LIKE '/%/.g2w/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/ignore/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/junk/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/project/%') OR /* DO NOT TRAVERSE OR BACKUP */ (MODE LIKE 's%') ) ) AND (PATH_NAME LIKE '/%/backmeup/%') /* TRAVERSE AND BACKUP */ AND (MISC_ATTRIBUTES LIKE '%u%') AND ( NOT ( (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.SpaceMan/%') ) ) AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) _________________________________________________________ [root at wos-gateway02 bin]# time ./mmbackup-wos.sh -------------------------------------------------------- mmbackup: Backup of /wosgpfs begins at Mon May 29 15:54:47 EDT 2017. -------------------------------------------------------- Mon May 29 15:54:49 2017 mmbackup:using user supplied policy rules: /sysadmin/BA/ba-wos/bin/mmbackupRules.wosgpfs Mon May 29 15:54:49 2017 mmbackup:Scanning file system wosgpfs Mon May 29 15:54:52 2017 mmbackup:Determining file system changes for wosgpfs [TAPENODE3]. Mon May 29 15:54:52 2017 mmbackup:changed=3, expired=0, unsupported=0 for server [TAPENODE3] Mon May 29 15:54:52 2017 mmbackup:Sending files to the TSM server [3 changed, 0 expired]. mmbackup: TSM Summary Information: Total number of objects inspected: 3 Total number of objects backed up: 3 Total number of objects updated: 0 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 0 Total number of objects failed: 0 Total number of objects encrypted: 0 Total number of bytes inspected: 4096 Total number of bytes transferred: 512 ---------------------------------------------------------- mmbackup: Backup of /wosgpfs completed successfully at Mon May 29 15:54:56 EDT 2017. ---------------------------------------------------------- real 0m9.276s user 0m2.906s sys 0m3.212s _________________________________________________________ Thanks for all the help Jaime > > > > > From: Jez Tucker > To: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 03:33 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > When mmbackup has passed the preflight stage (pretty quickly) you'll > find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* > > Best, > > Jez > > > On 18/05/17 20:02, Jaime Pinto wrote: > Ok Mark > > I'll follow your option 2) suggestion, and capture what mmbackup is using > as a rule first, then modify it. > > I imagine by 'capture' you are referring to the -L n level I use? > > -L n > Controls the level of information displayed by the > mmbackup command. Larger values indicate the > display of more detailed information. n should be one of > the following values: > > 3 > Displays the same information as 2, plus each > candidate file and the applicable rule. > > 4 > Displays the same information as 3, plus each > explicitly EXCLUDEed or LISTed > file, and the applicable rule. > > 5 > Displays the same information as 4, plus the > attributes of candidate and EXCLUDEed or > LISTed files. > > 6 > Displays the same information as 5, plus > non-candidate files and their attributes. > > Thanks > Jaime > > > > > Quoting "Marc A Kaplan" : > > 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup > wants to support incremental backups (using what it calls its shadow > database) and keep both your sanity and its sanity -- so mmbackup limits > you to either full filesystem or full inode-space (independent fileset.) > If you want to do something else, okay, but you have to be careful and be > sure of yourself. IBM will not be able to jump in and help you if and when > > it comes time to restore and you discover that your backup(s) were not > complete. > > 2. If you decide you're a big boy (or woman or XXX) and want to do some > hacking ... Fine... But even then, I suggest you do the smallest hack > that will mostly achieve your goal... > DO NOT think you can create a custom policy rules list for mmbackup out of > > thin air.... Capture the rules mmbackup creates and make small changes to > > that -- > And as with any disaster recovery plan..... Plan your Test and Test your > > Plan.... Then do some dry run recoveries before you really "need" to do a > > real recovery. > > I only even sugest this because Jaime says he has a huge filesystem with > several dependent filesets and he really, really wants to do a partial > backup, without first copying or re-organizing the filesets. > > HMMM.... otoh... if you have one or more dependent filesets that are > smallish, and/or you don't need the backups -- create independent > filesets, copy/move/delete the data, rename, voila. > > > > From: "Jaime Pinto" > To: "Marc A Kaplan" > Cc: "gpfsug main discussion list" > Date: 05/18/2017 12:36 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Marc > > The -P option may be a very good workaround, but I still have to test it. > > I'm currently trying to craft the mm rule, as minimalist as possible, > however I'm not sure about what attributes mmbackup expects to see. > > Below is my first attempt. It would be nice to get comments from > somebody familiar with the inner works of mmbackup. > > Thanks > Jaime > > > /* A macro to abbreviate VARCHAR */ > define([vc],[VARCHAR($1)]) > > /* Define three external lists */ > RULE EXTERNAL LIST 'allfiles' EXEC > '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' > > /* Generate a list of all files, directories, plus all other file > system objects, > like symlinks, named pipes, etc. Include the owner's id with each > object and > sort them by the owner's id */ > > RULE 'r1' LIST 'allfiles' > DIRECTORIES_PLUS > SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || > vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) > FROM POOL 'system' > FOR FILESET('sysadmin3') > > /* Files in special filesets, such as those excluded, are never traversed > */ > RULE 'ExcSpecialFile' EXCLUDE > FOR FILESET('scratch3','project3') > > > > > > Quoting "Marc A Kaplan" : > > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that > the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup > for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset .... > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support > team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused > and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such > thing. > Filesets are kinda like little filesystems within filesystems. Moving > a > file from one fileset to another requires a copy operation. There is > no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject: Re: [gpfsug-discuss] What is an independent fileset? > was: > mmbackup with fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always > think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that > are > in a separable range of inode numbers - this allows GPFS to efficiently > do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be > represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset.... If you didn't say otherwise, inodes come from the > default "root" fileset.... > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned.... > > BTW - if you look at the bits of the inode numbers carefully --- you > may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope > errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots > of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the > only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor > ESS, > so anyone in this list feel free to give feedback on that page people > with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a > new > independent fileset (multiple ways to do that) or use the TSM client > config. > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Jaime Pinto" > Cc: > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Date: Thu, May 18, 2017 4:43 AM > > There is hope. See reference link below: > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > > > > > > The issue has to do with dependent vs. independent filesets, something > I didn't even realize existed until now. Our filesets are dependent > (for no particular reason), so I have to find a way to turn them into > independent. > > The proper option syntax is "--scope inodespace", and the error > message actually flagged that out, however I didn't know how to > interpret what I saw: > > > # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > -------------------------------------------------------- > mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 > 21:27:43 EDT 2017. > -------------------------------------------------------- > Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* > fileset sysadmin3 is not supported > Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > -------------------------------------------------------- > > Will post the outcome. > Jaime > > > > Quoting "Jaime Pinto" : > > Quoting "Luis Bolinches" : > > Hi > > have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as > inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, > and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > > > Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is > linked > on /IBM/GPFS/FSET1 > > dsm.sys > ... > > DOMAIN /IBM/GPFS > EXCLUDE.DIR /IBM/GPFS/FSET1 > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > > Date: 17-05-17 23:44 > Subject: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: > * project3 > * scratch3 > * sysadmin3 > > I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we > have no need or space to include *scratch3* on TSM. > > Question: how to craft the mmbackup command to backup > /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? > > Below are 3 types of errors: > > 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > > ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up > dependent fileset sysadmin3 is not supported > Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > > 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope filesystem --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > These examples don't really cover my case: > > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > > > > > > > Thanks > Jaime > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Jez Tucker > Head of Research and Development, Pixit Media > 07764193820 | jtucker at pixitmedia.com > www.pixitmedia.com | Tw:@pixitmedia.com > > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email._______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Tomasz.Wolski at ts.fujitsu.com Mon May 29 21:23:12 2017 From: Tomasz.Wolski at ts.fujitsu.com (Tomasz.Wolski at ts.fujitsu.com) Date: Mon, 29 May 2017 20:23:12 +0000 Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 Message-ID: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> Hello, We are planning to integrate new IBM Spectrum Scale version 4.2.3 into our software, but in our current software release we have version 4.1.1 integrated. We are worried how would node-at-a-time updates look like when our customer wanted to update his cluster from 4.1.1 to 4.2.3 version. According to "Concepts, Planning and Installation Guide" (for 4.2.3), there's a limited compatibility between two GPFS versions and if they're not adjacent, then following update path is advised: "If you want to migrate to an IBM Spectrum Scale version that is not an adjacent release of your current version (for example, version 4.1.1.x to version 4.2.3), you can follow this migration path depending on your current version: V 3.5 > V 4.1.0.x > V 4.1.1.x > V 4.2.0.x > V 4.2.1.x > V 4.2.2.x > V 4.2.3.x" My question is: is the above statement true even though on nodes where new GPFS 4.2.3 is installed these nodes will not be migrated to latest release with "mmchconfig release=LATEST" until all nodes in the cluster will have been updated to version 4.2.3? In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? Best regards, Tomasz Wolski With best regards / Mit freundlichen Gr??en / Pozdrawiam Tomasz Wolski Development Engineer NDC Eternus CS HE (ET2) [cid:image002.gif at 01CE62B9.8ACFA960] FUJITSU Fujitsu Technology Solutions Sp. z o.o. Textorial Park Bldg C, ul. Fabryczna 17 90-344 Lodz, Poland -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2774 bytes Desc: image001.gif URL: From knop at us.ibm.com Tue May 30 03:54:04 2017 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 29 May 2017 22:54:04 -0400 Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 In-Reply-To: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> References: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> Message-ID: Tomasz, The statement below from "Concepts, Planning and Installation Guide" was found to be incorrect and is being withdrawn from the publications. The team is currently working on improvements to the guidance being provided for migration. For a cluster which is not running protocols like NFS/SMB/Object, migration of nodes one-at-a-time from 4.1.1 to 4.2.3 should work. Once all nodes are migrated to 4.2.3, command mmchconfig release=LATEST can be issued to move the cluster to the 4.2.3 level. Note that the command above will not change the file system level. The file system can be moved to the latest level with command mmchfs file-system-name -V full In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? That is expected to work. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 05/29/2017 04:24 PM Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, We are planning to integrate new IBM Spectrum Scale version 4.2.3 into our software, but in our current software release we have version 4.1.1 integrated. We are worried how would node-at-a-time updates look like when our customer wanted to update his cluster from 4.1.1 to 4.2.3 version. According to ?Concepts, Planning and Installation Guide? (for 4.2.3), there?s a limited compatibility between two GPFS versions and if they?re not adjacent, then following update path is advised: ?If you want to migrate to an IBM Spectrum Scale version that is not an adjacent release of your current version (for example, version 4.1.1.x to version 4.2.3), you can follow this migration path depending on your current version: V 3.5 > V 4.1.0.x > V 4.1.1.x > V 4.2.0.x > V 4.2.1.x > V 4.2.2.x > V 4.2.3.x? My question is: is the above statement true even though on nodes where new GPFS 4.2.3 is installed these nodes will not be migrated to latest release with ?mmchconfig release=LATEST? until all nodes in the cluster will have been updated to version 4.2.3? In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? Best regards, Tomasz Wolski With best regards / Mit freundlichen Gr??en / Pozdrawiam Tomasz Wolski Development Engineer NDC Eternus CS HE (ET2) FUJITSU Fujitsu Technology Solutions Sp. z o.o. Textorial Park Bldg C, ul. Fabryczna 17 90-344 Lodz, Poland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2774 bytes Desc: not available URL: From Achim.Rehor at de.ibm.com Tue May 30 08:42:23 2017 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Tue, 30 May 2017 09:42:23 +0200 Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 In-Reply-To: References: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2774 bytes Desc: not available URL: From andreas.petzold at kit.edu Tue May 30 13:16:40 2017 From: andreas.petzold at kit.edu (Andreas Petzold (SCC)) Date: Tue, 30 May 2017 14:16:40 +0200 Subject: [gpfsug-discuss] Associating I/O operations with files/processes Message-ID: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> Dear group, first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 Since a few days we see this on the problematic node: 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? Thanks, Andreas -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Andreas Petzold Hermann-von-Helmholtz-Platz 1, Building 449, Room 202 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 24916 Fax: +49 721 608 24972 Email: petzold at kit.edu www.scc.kit.edu KIT ? The Research University in the Helmholtz Association Since 2010, KIT has been certified as a family-friendly university. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5323 bytes Desc: S/MIME Cryptographic Signature URL: From john.hearns at asml.com Tue May 30 13:28:17 2017 From: john.hearns at asml.com (John Hearns) Date: Tue, 30 May 2017 12:28:17 +0000 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> Message-ID: Andreas, This is a stupid reply, but please bear with me. Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS filesystem) setup. We also had a new application which did post-processing One of the users reported that a post-processing job would take about 30 minutes. However when two or more of the same application were running the job would take several hours. We finally found that this slowdown was due to the IO size, the application was using the default size. We only found this by stracing the application and spending hours staring at the trace... I am sure there are better tools for this, and I do hope you don?t have to strace every application.... really. A good tool to get a general feel for IO pattersn is 'iotop'. It might help? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold (SCC) Sent: Tuesday, May 30, 2017 2:17 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Associating I/O operations with files/processes Dear group, first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 Since a few days we see this on the problematic node: 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? Thanks, Andreas -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Andreas Petzold Hermann-von-Helmholtz-Platz 1, Building 449, Room 202 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 24916 Fax: +49 721 608 24972 Email: petzold at kit.edu https://emea01.safelinks.protection.outlook.com/?url=www.scc.kit.edu&data=01%7C01%7Cjohn.hearns%40asml.com%7Cd3f8f819bf21408c419e08d4a755bde9%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=IwCAFwU6OI38yZK9cnmAcWpWD%2BlujeYDpgXuvvAdvVg%3D&reserved=0 KIT ? The Research University in the Helmholtz Association Since 2010, KIT has been certified as a family-friendly university. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From andreas.petzold at kit.edu Tue May 30 14:12:52 2017 From: andreas.petzold at kit.edu (Andreas Petzold (SCC)) Date: Tue, 30 May 2017 15:12:52 +0200 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> Message-ID: <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> Hi John, iotop wasn't helpful. It seems to be overwhelmed by what is going on on the machine. Cheers, Andreas On 05/30/2017 02:28 PM, John Hearns wrote: > Andreas, > This is a stupid reply, but please bear with me. > Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS filesystem) setup. > We also had a new application which did post-processing One of the users reported that a post-processing job would take about 30 minutes. > However when two or more of the same application were running the job would take several hours. > > We finally found that this slowdown was due to the IO size, the application was using the default size. > We only found this by stracing the application and spending hours staring at the trace... > > I am sure there are better tools for this, and I do hope you don?t have to strace every application.... really. > A good tool to get a general feel for IO pattersn is 'iotop'. It might help? > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold (SCC) > Sent: Tuesday, May 30, 2017 2:17 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] Associating I/O operations with files/processes > > Dear group, > > first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. > > Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): > > 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 > 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 > 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 > 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 > 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 > > Since a few days we see this on the problematic node: > > 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 > 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 > 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 > 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 > > The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. > > We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? > > Thanks, > > Andreas > > -- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5323 bytes Desc: S/MIME Cryptographic Signature URL: From aaron.s.knister at nasa.gov Tue May 30 14:47:52 2017 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Tue, 30 May 2017 13:47:52 +0000 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> , <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> Message-ID: <89666459-A01A-4B1D-BEDF-F742E8E888A9@nasa.gov> Hi Andreas, I often start with an lsof to see who has files open on the troubled filesystem and then start stracing the various processes to see which is responsible. It ought to be a process blocked in uninterruptible sleep and ideally would be obvious but on a shared machine it might not be. Something else you could do is a reverse lookup of the disk addresseses in iohist using mmfileid. This won't help if these are transient files but it could point you in the right direction. Careful though it'll give your metadata disks a tickle :) the syntax would be "mmfileid $FsName -d :$DiskAddrrss" where $DiskAddress is the 4th field from the iohist". It's not a quick command to return-- it could easily take up to a half hour but it should tell you which file path contains that disk address. Sometimes this is all too tedious and in that case grabbing some trace data can help. When you're experiencing I/O trouble you can run "mmtrace trace=def start" on the node, wait about a minute or so and then run "mmtrace stop". The resulting trcrpt file is bit of a monster to go through but I do believe you can see which PIDs are responsible for the I/O given some sleuthing. If it comes to that let me know and I'll see if I can point you at some phrases to grep for. It's been a while since I've done it. -Aaron On May 30, 2017 at 09:13:09 EDT, Andreas Petzold (SCC) wrote: Hi John, iotop wasn't helpful. It seems to be overwhelmed by what is going on on the machine. Cheers, Andreas On 05/30/2017 02:28 PM, John Hearns wrote: > Andreas, > This is a stupid reply, but please bear with me. > Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS filesystem) setup. > We also had a new application which did post-processing One of the users reported that a post-processing job would take about 30 minutes. > However when two or more of the same application were running the job would take several hours. > > We finally found that this slowdown was due to the IO size, the application was using the default size. > We only found this by stracing the application and spending hours staring at the trace... > > I am sure there are better tools for this, and I do hope you don?t have to strace every application.... really. > A good tool to get a general feel for IO pattersn is 'iotop'. It might help? > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold (SCC) > Sent: Tuesday, May 30, 2017 2:17 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] Associating I/O operations with files/processes > > Dear group, > > first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. > > Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): > > 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 > 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 > 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 > 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 > 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 > > Since a few days we see this on the problematic node: > > 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 > 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 > 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 > 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 > > The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. > > We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? > > Thanks, > > Andreas > > -- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Tue May 30 14:55:30 2017 From: oehmes at gmail.com (Sven Oehme) Date: Tue, 30 May 2017 13:55:30 +0000 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> Message-ID: Hi, the very first thing to do would be to do a mmfsadm dump iohist instead of mmdiag --iohist one time (we actually add this info in next release to mmdiag --iohist) to see if the thread type will reveal something : 07:25:53.578522 W data 1:20260249600 8192 35.930 488076 181 C0A70D0A:59076980 cli 192.167.20.129 Prefetch WritebehindWorkerThread 07:25:53.632722 W data 1:20260257792 8192 45.179 627136 173 C0A70D0A:59076980 cli 192.167.20.129 Cleaner CleanBufferThread 07:25:53.662067 W data 2:20259815424 8192 45.612 992975086 40 C0A70D0A:59076985 cli 192.167.20.130 Prefetch WritebehindWorkerThread 07:25:53.734274 W data 1:19601858560 8 0.624 50237 0 C0A70D0A:59076980 cli 192.167.20.129 MBHandler *DioHandlerThread* if you see DioHandlerThread most likely somebody changed a openflag to use O_DIRECT . if you don't use that flag even the app does only 4k i/o which is inefficient GPFS will detect this and do prefetch writebehind in large blocks, as soon as you add O_DIRECT, we don't do this anymore to honor the hint and then every single request gets handled one by one. after that the next thing would be to run a very low level trace with just IO infos like : mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N . this will start collection on the node you execute the command if you want to run it against a different node replace the dot at the end with the hostname . wait a few seconds and run mmtracectl --off you will get a message that the trace gets formated and a eventually a trace file . now grep for FIO and you get lines like this : 7.293293470 127182 TRACE_IO: FIO: write data tag 1670183 1 ioVecSize 64 1st buf 0x5C024940000 nsdId C0A71482:5872D94A da 2:51070828544 nSectors 32768 err 0 if you further reduce it to nSectors 8 you would focus only on your 4k writes you mentioned above. the key item in the line above you care about is tag 16... this is the inode number of your file. if you now do : cd /usr/lpp/mmfs/samples/util ; make then run (replace -i and filesystem path obviously) [root at fire01 util]# ./tsfindinode -i 1670183 /ibm/fs2-16m-09/ and you get a hit like this : 1670183 0 /ibm/fs2-16m-09//shared/test-newbuf you now know the file that is being accessed in the I/O example above is /ibm/fs2-16m-09//shared/test-newbuf hope that helps. sven On Tue, May 30, 2017 at 6:12 AM Andreas Petzold (SCC) < andreas.petzold at kit.edu> wrote: > Hi John, > > iotop wasn't helpful. It seems to be overwhelmed by what is going on on > the machine. > > Cheers, > > Andreas > > On 05/30/2017 02:28 PM, John Hearns wrote: > > Andreas, > > This is a stupid reply, but please bear with me. > > Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS > filesystem) setup. > > We also had a new application which did post-processing One of the users > reported that a post-processing job would take about 30 minutes. > > However when two or more of the same application were running the job > would take several hours. > > > > We finally found that this slowdown was due to the IO size, the > application was using the default size. > > We only found this by stracing the application and spending hours > staring at the trace... > > > > I am sure there are better tools for this, and I do hope you don?t have > to strace every application.... really. > > A good tool to get a general feel for IO pattersn is 'iotop'. It might > help? > > > > > > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold > (SCC) > > Sent: Tuesday, May 30, 2017 2:17 PM > > To: gpfsug-discuss at spectrumscale.org > > Subject: [gpfsug-discuss] Associating I/O operations with files/processes > > > > Dear group, > > > > first a quick introduction: at KIT we are running a 20+PB storage system > with several large (1-9PB) file systems. We have a 14 node NSD server > cluster and 5 small (~10 nodes) protocol node clusters which each mount one > of the file systems. The protocol nodes run server software (dCache, > xrootd) specific to our users which primarily are the LHC experiments at > CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, > while the protocol nodes communicate via Ethernet to their clients. > > > > Now let me describe the problem we are facing. Since a few days, one of > the protocol nodes shows a very strange and as of yet unexplained I/O > behaviour. Before we were usually seeing reads like this (iohist example > from a well behaved node): > > > > 14:03:37.637526 R data 32:138835918848 8192 46.626 cli > 0A417D79:58E3B179 172.18.224.19 > > 14:03:37.660177 R data 18:12590325760 8192 25.498 cli > 0A4179AD:58E3AE66 172.18.224.14 > > 14:03:37.640660 R data 15:106365067264 8192 45.682 cli > 0A4179AD:58E3ADD7 172.18.224.14 > > 14:03:37.657006 R data 35:130482421760 8192 30.872 cli > 0A417DAD:58E3B266 172.18.224.21 > > 14:03:37.643908 R data 33:107847139328 8192 45.571 cli > 0A417DAD:58E3B206 172.18.224.21 > > > > Since a few days we see this on the problematic node: > > > > 14:06:27.253537 R data 46:126258287872 8 15.474 cli > 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.268626 R data 40:137280768624 8 0.395 cli > 0A4179AD:58E3ADE3 172.18.224.14 > > 14:06:27.269056 R data 46:56452781528 8 0.427 cli > 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.269417 R data 47:97273159640 8 0.293 cli > 0A4179AD:58E3AE5A 172.18.224.14 > > 14:06:27.269293 R data 49:59102786168 8 0.425 cli > 0A4179AD:58E3AE72 172.18.224.14 > > 14:06:27.269531 R data 46:142387326944 8 0.340 cli > 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.269377 R data 28:102988517096 8 0.554 cli > 0A417879:58E3AD08 172.18.224.10 > > > > The number of read ops has gone up by O(1000) which is what one would > expect when going from 8192 sector reads to 8 sector reads. > > > > We have already excluded problems of node itself so we are focusing on > the applications running on the node. What we'd like to to is to associate > the I/O requests either with files or specific processes running on the > machine in order to be able to blame the correct application. Can somebody > tell us, if this is possible and if now, if there are other ways to > understand what application is causing this? > > > > Thanks, > > > > Andreas > > > > -- > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.petzold at kit.edu Tue May 30 15:00:27 2017 From: andreas.petzold at kit.edu (Andreas Petzold (SCC)) Date: Tue, 30 May 2017 16:00:27 +0200 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> Message-ID: <45aa5c60-4a79-015a-7236-556b7834714f@kit.edu> Hi Sven, we are seeing FileBlockRandomReadFetchHandlerThread. I'll let you know once I have more results Thanks, Andreas On 05/30/2017 03:55 PM, Sven Oehme wrote: > Hi, > > the very first thing to do would be to do a mmfsadm dump iohist instead > of mmdiag --iohist one time (we actually add this info in next release > to mmdiag --iohist) to see if the thread type will reveal something : > > 07:25:53.578522 W data 1:20260249600 8192 35.930 > 488076 181 C0A70D0A:59076980 cli 192.167.20.129 Prefetch > WritebehindWorkerThread > 07:25:53.632722 W data 1:20260257792 8192 45.179 > 627136 173 C0A70D0A:59076980 cli 192.167.20.129 Cleaner > CleanBufferThread > 07:25:53.662067 W data 2:20259815424 8192 45.612 > 992975086 40 C0A70D0A:59076985 cli 192.167.20.130 Prefetch > WritebehindWorkerThread > 07:25:53.734274 W data 1:19601858560 8 0.624 > 50237 0 C0A70D0A:59076980 cli 192.167.20.129 MBHandler > *_DioHandlerThread_* > > if you see DioHandlerThread most likely somebody changed a openflag to > use O_DIRECT . if you don't use that flag even the app does only 4k i/o > which is inefficient GPFS will detect this and do prefetch writebehind > in large blocks, as soon as you add O_DIRECT, we don't do this anymore > to honor the hint and then every single request gets handled one by one. > > after that the next thing would be to run a very low level trace with > just IO infos like : > > mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N . > > this will start collection on the node you execute the command if you > want to run it against a different node replace the dot at the end with > the hostname . > wait a few seconds and run > > mmtracectl --off > > you will get a message that the trace gets formated and a eventually a > trace file . > now grep for FIO and you get lines like this : > > > 7.293293470 127182 TRACE_IO: FIO: write data tag 1670183 1 ioVecSize > 64 1st buf 0x5C024940000 nsdId C0A71482:5872D94A da 2:51070828544 > nSectors 32768 err 0 > > if you further reduce it to nSectors 8 you would focus only on your 4k > writes you mentioned above. > > the key item in the line above you care about is tag 16... this is the > inode number of your file. > if you now do : > > cd /usr/lpp/mmfs/samples/util ; make > then run (replace -i and filesystem path obviously) > > [root at fire01 util]# ./tsfindinode -i 1670183 /ibm/fs2-16m-09/ > > and you get a hit like this : > > 1670183 0 /ibm/fs2-16m-09//shared/test-newbuf > > you now know the file that is being accessed in the I/O example above is > /ibm/fs2-16m-09//shared/test-newbuf > > hope that helps. > > sven > > > > > On Tue, May 30, 2017 at 6:12 AM Andreas Petzold (SCC) > > wrote: > > Hi John, > > iotop wasn't helpful. It seems to be overwhelmed by what is going on on > the machine. > > Cheers, > > Andreas > > On 05/30/2017 02:28 PM, John Hearns wrote: > > Andreas, > > This is a stupid reply, but please bear with me. > > Not exactly GPFS related, but I once managed an SGI CXFS > (Clustered XFS filesystem) setup. > > We also had a new application which did post-processing One of the > users reported that a post-processing job would take about 30 minutes. > > However when two or more of the same application were running the > job would take several hours. > > > > We finally found that this slowdown was due to the IO size, the > application was using the default size. > > We only found this by stracing the application and spending hours > staring at the trace... > > > > I am sure there are better tools for this, and I do hope you don?t > have to strace every application.... really. > > A good tool to get a general feel for IO pattersn is 'iotop'. It > might help? > > > > > > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at spectrumscale.org > > [mailto:gpfsug-discuss-bounces at spectrumscale.org > ] On Behalf Of > Andreas Petzold (SCC) > > Sent: Tuesday, May 30, 2017 2:17 PM > > To: gpfsug-discuss at spectrumscale.org > > > Subject: [gpfsug-discuss] Associating I/O operations with > files/processes > > > > Dear group, > > > > first a quick introduction: at KIT we are running a 20+PB storage > system with several large (1-9PB) file systems. We have a 14 node > NSD server cluster and 5 small (~10 nodes) protocol node clusters > which each mount one of the file systems. The protocol nodes run > server software (dCache, xrootd) specific to our users which > primarily are the LHC experiments at CERN. GPFS version is 4.2.2 > everywhere. All servers are connected via IB, while the protocol > nodes communicate via Ethernet to their clients. > > > > Now let me describe the problem we are facing. Since a few days, > one of the protocol nodes shows a very strange and as of yet > unexplained I/O behaviour. Before we were usually seeing reads like > this (iohist example from a well behaved node): > > > > 14:03:37.637526 R data 32:138835918848 8192 46.626 > cli 0A417D79:58E3B179 172.18.224.19 > > 14:03:37.660177 R data 18:12590325760 8192 25.498 > cli 0A4179AD:58E3AE66 172.18.224.14 > > 14:03:37.640660 R data 15:106365067264 8192 45.682 > cli 0A4179AD:58E3ADD7 172.18.224.14 > > 14:03:37.657006 R data 35:130482421760 8192 30.872 > cli 0A417DAD:58E3B266 172.18.224.21 > > 14:03:37.643908 R data 33:107847139328 8192 45.571 > cli 0A417DAD:58E3B206 172.18.224.21 > > > > Since a few days we see this on the problematic node: > > > > 14:06:27.253537 R data 46:126258287872 8 15.474 > cli 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.268626 R data 40:137280768624 8 0.395 > cli 0A4179AD:58E3ADE3 172.18.224.14 > > 14:06:27.269056 R data 46:56452781528 8 0.427 > cli 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.269417 R data 47:97273159640 8 0.293 > cli 0A4179AD:58E3AE5A 172.18.224.14 > > 14:06:27.269293 R data 49:59102786168 8 0.425 > cli 0A4179AD:58E3AE72 172.18.224.14 > > 14:06:27.269531 R data 46:142387326944 8 0.340 > cli 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.269377 R data 28:102988517096 8 0.554 > cli 0A417879:58E3AD08 172.18.224.10 > > > > The number of read ops has gone up by O(1000) which is what one > would expect when going from 8192 sector reads to 8 sector reads. > > > > We have already excluded problems of node itself so we are > focusing on the applications running on the node. What we'd like to > to is to associate the I/O requests either with files or specific > processes running on the machine in order to be able to blame the > correct application. Can somebody tell us, if this is possible and > if now, if there are other ways to understand what application is > causing this? > > > > Thanks, > > > > Andreas > > > > -- > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Andreas Petzold Hermann-von-Helmholtz-Platz 1, Building 449, Room 202 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 24916 Fax: +49 721 608 24972 Email: petzold at kit.edu www.scc.kit.edu KIT ? The Research University in the Helmholtz Association Since 2010, KIT has been certified as a family-friendly university. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5323 bytes Desc: S/MIME Cryptographic Signature URL: From makaplan at us.ibm.com Tue May 30 15:39:50 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 30 May 2017 14:39:50 +0000 Subject: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170529160138.18847jpj5x9kz8ki@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca><20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca><20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Message-ID: Regarding mmbackup and TSM INCLUDE/EXCLUDE, I found this doc by googling... http://www-01.ibm.com/support/docview.wss?uid=swg21699569 Which says, among other things and includes many ifs,and,buts : "... include and exclude options are interpreted differently by the IBM Spectrum Scale mmbackup command and by the IBM Spectrum Protect backup-archive client..." I think mmbackup tries to handle usual, sensible, variants of the TSM directives that can be directly "translated" to more logical SQL, so you don't have to follow all the twists, but if it isn't working as you expected... RTFM... OTOH... If you are like or can work with the customize-the-policy-rules approach -- that is good too and makes possible finer grain controls. From: "Jaime Pinto" To: "gpfsug main discussion list" , "Marc A Kaplan" Date: 05/29/2017 04:01 PM Subject: Re: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors Quoting "Marc A Kaplan" : > Easier than hacking mmbackup or writing/editing policy rules, > > mmbackup interprets > your TSM INCLUDE/EXCLUDE configuration statements -- so that is a > supported and recommended way of doing business... Finally got some time to resume testing on this Here is the syntax used (In this test I want to backup /wosgpfs/backmeup only) mmbackup /wosgpfs -N wos-gateway02-ib0 -s /dev/shm --tsm-errorlog $logfile -L 4 As far as I can tell, the EXCLUDE statements in the TSM configuration (dsm.opt) are being *ignored*. I tried a couple of formats: 1) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup" INCLExcl "/sysadmin/BA/ba-wos/bin/inclexcl" 1a) cat /sysadmin/BA/ba-wos/bin/inclexcl exclude.dir /wosgpfs/ignore /wosgpfs/junk /wosgpfs/project 1b) cat /sysadmin/BA/ba-wos/bin/inclexcl exclude.dir /wosgpfs/ignore exclude.dir /wosgpfs/junk exclude.dir /wosgpfs/project 2) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup" exclude.dir /wosgpfs/ignore exclude.dir /wosgpfs/junk exclude.dir /wosgpfs/project 3) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup -/wosgpfs/ignore -/wosgpfs/junk -/wosgpfs/project" In another words, all the contents under /wosgpfs are being traversed and going to the TSM backup. Furthermore, even with "-L 4" mmbackup is not logging the list of files being sent to the TSM backup anywhere on the client side. I only get that information from the TSM server side (get filespace). I know that all contents of /wosgpfs are being traversed because I have a tail on /wosgpfs/.mmbackupShadow.1.TAPENODE3.filesys.update > > If that doesn't do it for your purposes... You're into some light > hacking... So look inside the mmbackup and tsbackup33 scripts and you'll > find some DEBUG variables that should allow for keeping work and temp > files around ... including the generated policy rules. > I'm calling this hacking "light", because I don't think you'll need to > change the scripts, but just look around and see how you can use what's > there to achieve your legitimate purposes. Even so, you will have crossed > a line where IBM support is "informal" at best. On the other hand I am having better luck with the customer rules file. The modified template below will traverse only the /wosgpfs/backmeup, as intended, and only backup files modified under that path. I guess I have a working solution that I will try at scale now. [root at wos-gateway02 bin]# cat dsm.opt SERVERNAME TAPENODE3 ARCHSYMLINKASFILE NO DOMAIN "/wosgpfs/backmeup" __________________________________________________________ /* Auto-generated GPFS policy rules file * Generated on Wed May 24 12:12:51 2017 */ /* Server rules for backup server 1 *** TAPENODE3 *** */ RULE EXTERNAL LIST 'mmbackup.1.TAPENODE3' EXEC '/wosgpfs/.mmbackupCfg/BAexecScript.wosgpfs' OPTS '"/wosgpfs/.mmbackupShadow.1.TAPENODE3.filesys.update" "-servername=TAPENODE3" "-auditlogname=/wosgpfs/mmbackup.audit.wosgpfs.TAPENODE3" "NONE"' RULE 'BackupRule' LIST 'mmbackup.1.TAPENODE3' DIRECTORIES_PLUS SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' ELSE 'resdnt' END )) WHERE ( NOT ( (PATH_NAME LIKE '/%/.mmbackup%') OR (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.mmLockDir/%') OR (PATH_NAME LIKE '/%/.g2w/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/ignore/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/junk/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/project/%') OR /* DO NOT TRAVERSE OR BACKUP */ (MODE LIKE 's%') ) ) AND (PATH_NAME LIKE '/%/backmeup/%') /* TRAVERSE AND BACKUP */ AND (MISC_ATTRIBUTES LIKE '%u%') AND ( NOT ( (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.SpaceMan/%') ) ) AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) _________________________________________________________ [root at wos-gateway02 bin]# time ./mmbackup-wos.sh -------------------------------------------------------- mmbackup: Backup of /wosgpfs begins at Mon May 29 15:54:47 EDT 2017. -------------------------------------------------------- Mon May 29 15:54:49 2017 mmbackup:using user supplied policy rules: /sysadmin/BA/ba-wos/bin/mmbackupRules.wosgpfs Mon May 29 15:54:49 2017 mmbackup:Scanning file system wosgpfs Mon May 29 15:54:52 2017 mmbackup:Determining file system changes for wosgpfs [TAPENODE3]. Mon May 29 15:54:52 2017 mmbackup:changed=3, expired=0, unsupported=0 for server [TAPENODE3] Mon May 29 15:54:52 2017 mmbackup:Sending files to the TSM server [3 changed, 0 expired]. mmbackup: TSM Summary Information: Total number of objects inspected: 3 Total number of objects backed up: 3 Total number of objects updated: 0 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 0 Total number of objects failed: 0 Total number of objects encrypted: 0 Total number of bytes inspected: 4096 Total number of bytes transferred: 512 ---------------------------------------------------------- mmbackup: Backup of /wosgpfs completed successfully at Mon May 29 15:54:56 EDT 2017. ---------------------------------------------------------- real 0m9.276s user 0m2.906s sys 0m3.212s _________________________________________________________ Thanks for all the help Jaime > > > > > From: Jez Tucker > To: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 03:33 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > When mmbackup has passed the preflight stage (pretty quickly) you'll > find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* > > Best, > > Jez > > > On 18/05/17 20:02, Jaime Pinto wrote: > Ok Mark > > I'll follow your option 2) suggestion, and capture what mmbackup is using > as a rule first, then modify it. > > I imagine by 'capture' you are referring to the -L n level I use? > > -L n > Controls the level of information displayed by the > mmbackup command. Larger values indicate the > display of more detailed information. n should be one of > the following values: > > 3 > Displays the same information as 2, plus each > candidate file and the applicable rule. > > 4 > Displays the same information as 3, plus each > explicitly EXCLUDEed or LISTed > file, and the applicable rule. > > 5 > Displays the same information as 4, plus the > attributes of candidate and EXCLUDEed or > LISTed files. > > 6 > Displays the same information as 5, plus > non-candidate files and their attributes. > > Thanks > Jaime > > > > > Quoting "Marc A Kaplan" : > > 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup > wants to support incremental backups (using what it calls its shadow > database) and keep both your sanity and its sanity -- so mmbackup limits > you to either full filesystem or full inode-space (independent fileset.) > If you want to do something else, okay, but you have to be careful and be > sure of yourself. IBM will not be able to jump in and help you if and when > > it comes time to restore and you discover that your backup(s) were not > complete. > > 2. If you decide you're a big boy (or woman or XXX) and want to do some > hacking ... Fine... But even then, I suggest you do the smallest hack > that will mostly achieve your goal... > DO NOT think you can create a custom policy rules list for mmbackup out of > > thin air.... Capture the rules mmbackup creates and make small changes to > > that -- > And as with any disaster recovery plan..... Plan your Test and Test your > > Plan.... Then do some dry run recoveries before you really "need" to do a > > real recovery. > > I only even sugest this because Jaime says he has a huge filesystem with > several dependent filesets and he really, really wants to do a partial > backup, without first copying or re-organizing the filesets. > > HMMM.... otoh... if you have one or more dependent filesets that are > smallish, and/or you don't need the backups -- create independent > filesets, copy/move/delete the data, rename, voila. > > > > From: "Jaime Pinto" > To: "Marc A Kaplan" > Cc: "gpfsug main discussion list" > Date: 05/18/2017 12:36 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Marc > > The -P option may be a very good workaround, but I still have to test it. > > I'm currently trying to craft the mm rule, as minimalist as possible, > however I'm not sure about what attributes mmbackup expects to see. > > Below is my first attempt. It would be nice to get comments from > somebody familiar with the inner works of mmbackup. > > Thanks > Jaime > > > /* A macro to abbreviate VARCHAR */ > define([vc],[VARCHAR($1)]) > > /* Define three external lists */ > RULE EXTERNAL LIST 'allfiles' EXEC > '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' > > /* Generate a list of all files, directories, plus all other file > system objects, > like symlinks, named pipes, etc. Include the owner's id with each > object and > sort them by the owner's id */ > > RULE 'r1' LIST 'allfiles' > DIRECTORIES_PLUS > SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || > vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) > FROM POOL 'system' > FOR FILESET('sysadmin3') > > /* Files in special filesets, such as those excluded, are never traversed > */ > RULE 'ExcSpecialFile' EXCLUDE > FOR FILESET('scratch3','project3') > > > > > > Quoting "Marc A Kaplan" : > > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that > the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup > for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset .... > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support > team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused > and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such > thing. > Filesets are kinda like little filesystems within filesystems. Moving > a > file from one fileset to another requires a copy operation. There is > no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject: Re: [gpfsug-discuss] What is an independent fileset? > was: > mmbackup with fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always > think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that > are > in a separable range of inode numbers - this allows GPFS to efficiently > do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be > represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset.... If you didn't say otherwise, inodes come from the > default "root" fileset.... > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned.... > > BTW - if you look at the bits of the inode numbers carefully --- you > may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope > errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots > of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the > only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor > ESS, > so anyone in this list feel free to give feedback on that page people > with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a > new > independent fileset (multiple ways to do that) or use the TSM client > config. > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Jaime Pinto" > Cc: > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Date: Thu, May 18, 2017 4:43 AM > > There is hope. See reference link below: > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > > > > > > The issue has to do with dependent vs. independent filesets, something > I didn't even realize existed until now. Our filesets are dependent > (for no particular reason), so I have to find a way to turn them into > independent. > > The proper option syntax is "--scope inodespace", and the error > message actually flagged that out, however I didn't know how to > interpret what I saw: > > > # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > -------------------------------------------------------- > mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 > 21:27:43 EDT 2017. > -------------------------------------------------------- > Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* > fileset sysadmin3 is not supported > Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > -------------------------------------------------------- > > Will post the outcome. > Jaime > > > > Quoting "Jaime Pinto" : > > Quoting "Luis Bolinches" : > > Hi > > have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as > inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, > and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > > > Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is > linked > on /IBM/GPFS/FSET1 > > dsm.sys > ... > > DOMAIN /IBM/GPFS > EXCLUDE.DIR /IBM/GPFS/FSET1 > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > > Date: 17-05-17 23:44 > Subject: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: > * project3 > * scratch3 > * sysadmin3 > > I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we > have no need or space to include *scratch3* on TSM. > > Question: how to craft the mmbackup command to backup > /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? > > Below are 3 types of errors: > > 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > > ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up > dependent fileset sysadmin3 is not supported > Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > > 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope filesystem --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > These examples don't really cover my case: > > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > > > > > > > Thanks > Jaime > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Jez Tucker > Head of Research and Development, Pixit Media > 07764193820 | jtucker at pixitmedia.com > www.pixitmedia.com | Tw:@pixitmedia.com > > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email._______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue May 30 16:15:11 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 30 May 2017 11:15:11 -0400 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> Message-ID: In version 4.2.3 you can turn on QOS --fine-stats and --pid-stats and get IO operations statistics for each active process on each node. https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmchqos.htm https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmlsqos.htm The statistics allow you to distinguish single sector IOPS from partial block multisector iops from full block multisector iops. Notice that to use this feature you must enable QOS, but by default you start by running with all throttles set at "unlimited". There are some overheads, so you might want to use it only when you need to find the "bad" processes. It's a little tricky to use effectively, but we give you a sample script that shows some ways to produce, massage and filter the raw data: samples/charts/qosplotfine.pl The data is available in a CSV format, so it's easy to feed into spreadsheets or data bases and crunch... --marc of GPFS. From: "Andreas Petzold (SCC)" To: Date: 05/30/2017 08:17 AM Subject: [gpfsug-discuss] Associating I/O operations with files/processes Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear group, first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 Since a few days we see this on the problematic node: 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? Thanks, Andreas -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Andreas Petzold Hermann-von-Helmholtz-Platz 1, Building 449, Room 202 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 24916 Fax: +49 721 608 24972 Email: petzold at kit.edu www.scc.kit.edu KIT ? The Research University in the Helmholtz Association Since 2010, KIT has been certified as a family-friendly university. [attachment "smime.p7s" deleted by Marc A Kaplan/Watson/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tomasz.Wolski at ts.fujitsu.com Wed May 31 10:33:29 2017 From: Tomasz.Wolski at ts.fujitsu.com (Tomasz.Wolski at ts.fujitsu.com) Date: Wed, 31 May 2017 09:33:29 +0000 Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 In-Reply-To: References: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> Message-ID: <5564b22a89744e06ad7003607248f279@R01UKEXCASM223.r01.fujitsu.local> Thank you very much - that?s very helpful and will save us a lot of effort :) Best regards, Tomasz Wolski From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Achim Rehor Sent: Tuesday, May 30, 2017 9:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 The statement was always to be release n-1 compatible. with release being (VRMF) 4.2.3.0 so all 4.2 release levels ought to be compatible with all 4.1 levels. As Felipe pointed out below, the mmchconfig RELEASE=latest will not touch the filesystem level. And if you are running remote clusters, you need to be aware, that lifting a filesystem to the latest level (mmchfs -V full) you will loose remote clusters mount ability if they are on a lower level. in these cases use the -V compat flag (and see commands refernce for details) Mit freundlichen Gr??en / Kind regards Achim Rehor ________________________________ Software Technical Support Specialist AIX/ Emea HPC Support [cid:image001.gif at 01D2DA01.B94BC9E0] IBM Certified Advanced Technical Expert - Power Systems with AIX TSCC Software Service, Dept. 7922 Global Technology Services ________________________________ Phone: +49-7034-274-7862 IBM Deutschland E-Mail: Achim.Rehor at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Reinhard Reschke, Dieter Scholz, Gregor Pillen, Ivo Koerner, Christian Noll Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940 From: "Felipe Knop" > To: gpfsug main discussion list > Date: 05/30/2017 04:54 AM Subject: Re: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Tomasz, The statement below from "Concepts, Planning and Installation Guide" was found to be incorrect and is being withdrawn from the publications. The team is currently working on improvements to the guidance being provided for migration. For a cluster which is not running protocols like NFS/SMB/Object, migration of nodes one-at-a-time from 4.1.1 to 4.2.3 should work. Once all nodes are migrated to 4.2.3, command mmchconfig release=LATEST can be issued to move the cluster to the 4.2.3 level. Note that the command above will not change the file system level. The file system can be moved to the latest level with command mmchfs file-system-name -V full In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? That is expected to work. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Tomasz.Wolski at ts.fujitsu.com" > To: "gpfsug-discuss at spectrumscale.org" > Date: 05/29/2017 04:24 PM Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, We are planning to integrate new IBM Spectrum Scale version 4.2.3 into our software, but in our current software release we have version 4.1.1 integrated. We are worried how would node-at-a-time updates look like when our customer wanted to update his cluster from 4.1.1 to 4.2.3 version. According to ?Concepts, Planning and Installation Guide? (for 4.2.3), there?s a limited compatibility between two GPFS versions and if they?re not adjacent, then following update path is advised: ?If you want to migrate to an IBM Spectrum Scale version that is not an adjacent release of your current version (for example, version 4.1.1.x to version 4.2.3), you can follow this migration path depending on your current version: V 3.5 > V 4.1.0.x > V 4.1.1.x > V 4.2.0.x > V 4.2.1.x > V 4.2.2.x > V 4.2.3.x? My question is: is the above statement true even though on nodes where new GPFS 4.2.3 is installed these nodes will not be migrated to latest release with ?mmchconfig release=LATEST? until all nodes in the cluster will have been updated to version 4.2.3? In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? Best regards, Tomasz Wolski With best regards / Mit freundlichen Gr??en / Pozdrawiam Tomasz Wolski Development Engineer NDC Eternus CS HE (ET2) [cid:image002.gif at 01CE62B9.8ACFA960] FUJITSU Fujitsu Technology Solutions Sp. z o.o. Textorial Park Bldg C, ul. Fabryczna 17 90-344 Lodz, Poland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 7182 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.gif Type: image/gif Size: 2774 bytes Desc: image002.gif URL: From Tomasz.Wolski at ts.fujitsu.com Wed May 31 11:00:02 2017 From: Tomasz.Wolski at ts.fujitsu.com (Tomasz.Wolski at ts.fujitsu.com) Date: Wed, 31 May 2017 10:00:02 +0000 Subject: [gpfsug-discuss] Missing gpfs filesystem device under /dev/ Message-ID: <8b209bc526024c49a4a002608f354b3c@R01UKEXCASM223.r01.fujitsu.local> Hello All, It seems that GPFS 4.2.3 does not create block device under /dev for new filesystems anymore - is this behavior intended? In manuals, there's nothing mentioned about this change. For example, having GPFS filesystem gpfs100 with mountpoint /cache/100, /proc/mounts has following entry: gpfs100 /cache/100 gpfs rw,relatime 0 0 where in older releases it used to be /dev/gpfs100 /cache/100 gpfs rw,relatime 0 0 Is there any option (i.e. supplied for mmcrfs) to have these device in /dev/ still in version 4.2.3? With best regards / Mit freundlichen Gr??en / Pozdrawiam Tomasz Wolski Development Engineer NDC Eternus CS HE (ET2) [cid:image002.gif at 01CE62B9.8ACFA960] FUJITSU Fujitsu Technology Solutions Sp. z o.o. Textorial Park Bldg C, ul. Fabryczna 17 90-344 Lodz, Poland -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2774 bytes Desc: image001.gif URL: From Robert.Oesterlin at nuance.com Wed May 31 12:13:01 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 31 May 2017 11:13:01 +0000 Subject: [gpfsug-discuss] Missing gpfs filesystem device under /dev/ Message-ID: <48568693-5E00-4637-9AEE-BA63CC8DC47A@nuance.com> This was a documented change back in (I think) GPFS 4.2.0, but I?d have to go back over the old release notes. It can?t be changed. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Tomasz.Wolski at ts.fujitsu.com" Reply-To: gpfsug main discussion list Date: Wednesday, May 31, 2017 at 5:00 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Missing gpfs filesystem device under /dev/ It seems that GPFS 4.2.3 does not create block device under /dev for new filesystems anymore - is this behavior intended? In manuals, there?s nothing mentioned about this change. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed May 31 12:25:13 2017 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 31 May 2017 07:25:13 -0400 Subject: [gpfsug-discuss] Missing gpfs filesystem device under /dev/ In-Reply-To: <48568693-5E00-4637-9AEE-BA63CC8DC47A@nuance.com> References: <48568693-5E00-4637-9AEE-BA63CC8DC47A@nuance.com> Message-ID: The change actually occurred in 4.2.1 to better integrate GPFS with systemd on RHEL 7.x. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 05/31/2017 07:13 AM Subject: Re: [gpfsug-discuss] Missing gpfs filesystem device under /dev/ Sent by: gpfsug-discuss-bounces at spectrumscale.org This was a documented change back in (I think) GPFS 4.2.0, but I?d have to go back over the old release notes. It can?t be changed. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Tomasz.Wolski at ts.fujitsu.com" Reply-To: gpfsug main discussion list Date: Wednesday, May 31, 2017 at 5:00 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Missing gpfs filesystem device under /dev/ It seems that GPFS 4.2.3 does not create block device under /dev for new filesystems anymore - is this behavior intended? In manuals, there?s nothing mentioned about this change. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue May 2 01:24:44 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 2 May 2017 00:24:44 +0000 Subject: [gpfsug-discuss] Supercomputing Hotels 2017 Hotels - Reserve Early! Message-ID: <7ED8704A-698A-4109-B843-EB6E8FF07478@nuance.com> Hotel reservations for the Supercomputing conference opened today, and the rooms are filling up VERY fast. My advice to everyone is that if you are at all considering going - reserve now. You can do so at no charge and can cancel for free up till mid-October. Cheap and close hotels already have some dates filled up. http://sc17.supercomputing.org/attendees/attendee-housing/ Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue May 2 10:58:02 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Tue, 2 May 2017 09:58:02 +0000 Subject: [gpfsug-discuss] Meet other spectrum scale users in May In-Reply-To: <1f483faa9cb61dcdc80afb187e908745@webmail.gpfsug.org> References: <1f483faa9cb61dcdc80afb187e908745@webmail.gpfsug.org> Message-ID: Hi All, Just to note that we need to send final numbers of the venue today for lunches etc, so if you are planning to attend, please register NOW! (otherwise you might not get lunch/entry to the evening event) Thanks Simon From: > on behalf of Secretary GPFS UG > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 27 April 2017 at 09:29 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Meet other spectrum scale users in May Dear Members, Please join us and other spectrum scale users for 2 days of great talks and networking! When: 9-10th May 2017 Where: Macdonald Manchester Hotel & Spa, Manchester, UK (right by Manchester Piccadilly train station) Who? The event is free to attend, is open to members from all industries and welcomes users with a little and a lot of experience using Spectrum Scale. The SSUG brings together the Spectrum Scale User Community including Spectrum Scale developers and architects to share knowledge, experiences and future plans. Topics include transparent cloud tiering, AFM, automation and security best practices, Docker and HDFS support, problem determination, and an update on Elastic Storage Server (ESS). Our popular forum includes interactive problem solving, a best practices discussion and networking. We're very excited to welcome back Doris Conti the Director for Spectrum Scale (GPFS) and HPC SW Product Development at IBM. The May meeting is sponsored by IBM, DDN, Lenovo, Mellanox, Seagate, Arcastream, Ellexus, and OCF. It is an excellent opportunity to learn more and get your questions answered. Register your place today at the Eventbrite page https://goo.gl/tRptru We hope to see you there! -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at wustl.edu Tue May 2 21:21:42 2017 From: mweil at wustl.edu (Matt Weil) Date: Tue, 2 May 2017 15:21:42 -0500 Subject: [gpfsug-discuss] AFM Message-ID: Hello all, Is there any way to rate limit the AFM traffic? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From scale at us.ibm.com Wed May 3 02:37:52 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 2 May 2017 21:37:52 -0400 Subject: [gpfsug-discuss] AFM In-Reply-To: References: Message-ID: Not that I am aware and QoS is not supported with any of the AFM traffic. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Matt Weil To: gpfsug main discussion list Date: 05/02/2017 04:22 PM Subject: [gpfsug-discuss] AFM Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello all, Is there any way to rate limit the AFM traffic? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed May 3 03:20:24 2017 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 3 May 2017 02:20:24 +0000 Subject: [gpfsug-discuss] AFM In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From martinsworkmachine at gmail.com Wed May 3 13:29:43 2017 From: martinsworkmachine at gmail.com (J Martin Rushton) Date: Wed, 3 May 2017 13:29:43 +0100 Subject: [gpfsug-discuss] Introduction Message-ID: Hi All As requested here is a brief introduction. I run a small cluster of 41 Linux nodes and we use GPFS for the user filesystems, user applications and a bunch of stuff in /opt. Backup/Archive is by Tivoli. Most user work is batch, with run times up to a couple of months (which makes updates a problem at times). I'm based near Sevenoaks in Kent, England. Regards, Martin From SAnderson at convergeone.com Wed May 3 18:08:36 2017 From: SAnderson at convergeone.com (Shaun Anderson) Date: Wed, 3 May 2017 17:08:36 +0000 Subject: [gpfsug-discuss] Tiebreaker disk question Message-ID: <1493831316163.52984@convergeone.com> We noticed some odd behavior recently. I have a customer with a small Scale (with Archive on top) configuration that we recently updated to a dual node configuration. We are using CES and setup a very small 3 nsd shared-root filesystem(gpfssr). We also set up tiebreaker disks and figured it would be ok to use the gpfssr NSDs for this purpose. When we tried to perform some basic failover testing, both nodes came down. It appears from the logs that when we initiated the node failure (via mmshutdown command...not great, I know) it unmounts and remounts the shared-root filesystem. When it did this, the cluster lost access to the tiebreaker disks, figured it had lost quorum and the other node came down as well. We got around this by changing the tiebreaker disks to our other normal gpfs filesystem. After that failover worked as expected. This is documented nowhere as far as I could find?. I wanted to know if anybody else had experienced this and if this is expected behavior. All is well now and operating as we want so I don't think we'll pursue a support request. Regards, SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments here to may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Thu May 4 06:27:11 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Thu, 04 May 2017 05:27:11 +0000 Subject: [gpfsug-discuss] Tiebreaker disk question In-Reply-To: <1493831316163.52984@convergeone.com> References: <1493831316163.52984@convergeone.com> Message-ID: This doesn't sound like normal behaviour. It shouldn't matter which filesystem your tiebreaker disks belong to. I think the failure was caused by something else, but am not able to guess from the little information you posted.. The mmfs.log will probably tell you the reason. -jf ons. 3. mai 2017 kl. 19.08 skrev Shaun Anderson : > We noticed some odd behavior recently. I have a customer with a small > Scale (with Archive on top) configuration that we recently updated to a > dual node configuration. We are using CES and setup a very small 3 > nsd shared-root filesystem(gpfssr). We also set up tiebreaker disks and > figured it would be ok to use the gpfssr NSDs for this purpose. > > > When we tried to perform some basic failover testing, both nodes came > down. It appears from the logs that when we initiated the node failure > (via mmshutdown command...not great, I know) it unmounts and remounts the > shared-root filesystem. When it did this, the cluster lost access to the > tiebreaker disks, figured it had lost quorum and the other node came down > as well. > > > We got around this by changing the tiebreaker disks to our other normal > gpfs filesystem. After that failover worked as expected. This is > documented nowhere as far as I could find?. I wanted to know if anybody > else had experienced this and if this is expected behavior. All is well > now and operating as we want so I don't think we'll pursue a support > request. > > > Regards, > > *SHAUN ANDERSON* > STORAGE ARCHITECT > O 208.577.2112 > M 214.263.7014 > > > NOTICE: This email message and any attachments here to may contain > confidential > information. Any unauthorized review, use, disclosure, or distribution of > such > information is prohibited. If you are not the intended recipient, please > contact > the sender by reply email and destroy the original message and all copies > of it. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu May 4 08:56:09 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 4 May 2017 09:56:09 +0200 Subject: [gpfsug-discuss] Tiebreaker disk question In-Reply-To: References: <1493831316163.52984@convergeone.com> Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu May 4 14:15:40 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 4 May 2017 13:15:40 +0000 Subject: [gpfsug-discuss] HAWC question Message-ID: Hi, I have a question about HAWC, we are trying to enable this for our OpenStack environment, system pool is on SSD already, so we try to change the log file size with: mmchfs FSNAME -L 128M This says: mmchfs: Attention: You must restart the GPFS daemons before the new log file size takes effect. The GPFS daemons can be restarted one node at a time. When the GPFS daemon is restarted on the last node in the cluster, the new log size becomes effective. We multi-cluster the file-system, so do we have to restart every node in all clusters, or just in the storage cluster? And how do we tell once it has become active? Thanks Simon From kenneth.waegeman at ugent.be Thu May 4 14:22:25 2017 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Thu, 4 May 2017 15:22:25 +0200 Subject: [gpfsug-discuss] bizarre performance behavior In-Reply-To: References: <2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov> <4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be> <67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov> <9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be> <7f7349c9-bdd3-5847-1cca-d98d221489fe@ugent.be> Message-ID: Hi, We found out using ib_read_bw and ib_write_bw that there were some links between server and clients degraded, having a bandwith of 350MB/s strangely, nsdperf did not report the same. It reported 12GB/s write and 9GB/s read, which was much more then we actually could achieve. So the problem was some bad ib routing. We changed some ib links, and then we got also 12GB/s read with nsdperf. On our clients we then are able to achieve the 7,2GB/s in total we also saw using the nsd servers! Many thanks for the help !! We are now running some tests with different blocksizes and parameters, because our backend storage is able to do more than the 7.2GB/s we get with GPFS (more like 14GB/s in total). I guess prefetchthreads and nsdworkerthreads are the ones to look at? Cheers! Kenneth On 21/04/17 22:27, Kumaran Rajaram wrote: > Hi Kenneth, > > As it was mentioned earlier, it will be good to first verify the raw > network performance between the NSD client and NSD server using the > nsdperf tool that is built with RDMA support. > g++ -O2 -DRDMA -o nsdperf -lpthread -lrt -libverbs -lrdmacm nsdperf.C > > In addition, since you have 2 x NSD servers it will be good to perform > NSD client file-system performance test with just single NSD server > (mmshutdown the other server, assuming all the NSDs have primary, > server NSD server configured + Quorum will be intact when a NSD server > is brought down) to see if it helps to improve the read performance + > if there are variations in the file-system read bandwidth results > between NSD_server#1 'active' vs. NSD_server #2 'active' (with other > NSD server in GPFS "down" state). If there is significant variation, > it can help to isolate the issue to particular NSD server (HW or IB > issue?). > > You can issue "mmdiag --waiters" on NSD client as well as NSD servers > during your dd test, to verify if there are unsual long GPFS waiters. > In addition, you may issue Linux "perf top -z" command on the GPFS > node to see if there is high CPU usage by any particular call/event > (for e.g., If GPFS config parameter verbsRdmaMaxSendBytes has been > set to low value from the default 16M, then it can cause RDMA > completion threads to go CPU bound ). Please verify some performance > scenarios detailed in Chapter 22 in Spectrum Scale Problem > Determination Guide (link below). > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/pdf/scale_pdg.pdf?view=kc > > Thanks, > -Kums > > > > > > From: Kenneth Waegeman > To: gpfsug main discussion list > Date: 04/21/2017 11:43 AM > Subject: Re: [gpfsug-discuss] bizarre performance behavior > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Hi, > > We already verified this on our nsds: > > [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --QpiSpeed > QpiSpeed=maxdatarate > [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --turbomode > turbomode=enable > [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg ?-SysProfile > SysProfile=perfoptimized > > so sadly this is not the issue. > > Also the output of the verbs commands look ok, there are connections > from the client to the nsds are there is data being read and writen. > > Thanks again! > > Kenneth > > > On 21/04/17 16:01, Kumaran Rajaram wrote: > Hi, > > Try enabling the following in the BIOS of the NSD servers (screen > shots below) > > * Turbo Mode - Enable > * QPI Link Frequency - Max Performance > * Operating Mode - Maximum Performance > * >>>>While we have even better performance with sequential reads on > raw storage LUNS, using GPFS we can only reach 1GB/s in total > (each nsd server seems limited by 0,5GB/s) independent of the > number of clients > > >>We are testing from 2 testing machines connected to the nsds > with infiniband, verbs enabled. > > > Also, It will be good to verify that all the GPFS nodes have Verbs > RDMA started using "mmfsadm test verbs status" and that the NSD > client-server communication from client to server during "dd" is > actually using Verbs RDMA using "mmfsadm test verbs conn" command (on > NSD client doing dd). If not, then GPFS might be using TCP/IP network > over which the cluster is configured impacting performance (If this is > the case, GPFS mmfs.log.latest for any Verbs RDMA related errors and > resolve). > > * > > > > > > > Regards, > -Kums > > > > > > > From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > __ > To: gpfsug main discussion list __ > > Date: 04/21/2017 09:11 AM > Subject: Re: [gpfsug-discuss] bizarre performance behavior > Sent by: _gpfsug-discuss-bounces at spectrumscale.org_ > > ------------------------------------------------------------------------ > > > > Fantastic news! It might also be worth running "cpupower monitor" or > "turbostat" on your NSD servers while you're running dd tests from the > clients to see what CPU frequency your cores are actually running at. > > A typical NSD server workload (especially with IB verbs and for reads) > can be pretty light on CPU which might not prompt your CPU crew > governor to up the frequency (which can affect throughout). If your > frequency scaling governor isn't kicking up the frequency of your CPUs > I've seen that cause this behavior in my testing. > > -Aaron > > > > > On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman > __ wrote: > > Hi, > > We are running a test setup with 2 NSD Servers backed by 4 Dell > Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of > the 4 powervaults, nsd02 is primary serving LUNS of controller B. > > We are testing from 2 testing machines connected to the nsds with > infiniband, verbs enabled. > > When we do dd from the NSD servers, we see indeed performance going to > 5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is > able to get the data at a decent speed. Since we can write from the > clients at a good speed, I didn't suspect the communication between > clients and nsds being the issue, especially since total performance > stays the same using 1 or multiple clients. > > I'll use the nsdperf tool to see if we can find anything, > > thanks! > > K > > On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE > CORP] wrote: > Interesting. Could you share a little more about your architecture? Is > it possible to mount the fs on an NSD server and do some dd's from the > fs on the NSD server? If that gives you decent performance perhaps try > NSDPERF next > _https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf_ > > > -Aaron > > > > > On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman > __ wrote: > > Hi, > > Having an issue that looks the same as this one: > > We can do sequential writes to the filesystem at 7,8 GB/s total , > which is the expected speed for our current storage > backend. While we have even better performance with sequential reads > on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each > nsd server seems limited by 0,5GB/s) independent of the number of clients > (1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, > MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed > in this thread, but nothing seems to impact this read performance. > > Any ideas? > > Thanks! > > Kenneth > > On 17/02/17 19:29, Jan-Frode Myklebust wrote: > I just had a similar experience from a sandisk infiniflash system > SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for > writes. and 250-300 Mbyte/s on sequential reads!! Random reads were on > the order of 2 Gbyte/s. > > After a bit head scratching snd fumbling around I found out that > reducing maxMBpS from 10000 to 100 fixed the problem! Digging further > I found that reducing prefetchThreads from default=72 to 32 also fixed > it, while leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s. > > Could something like this be the problem on your box as well? > > > > -jf > fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister > <_aaron.s.knister at nasa.gov_ >: > Well, I'm somewhat scrounging for hardware. This is in our test > environment :) And yep, it's got the 2U gpu-tray in it although even > without the riser it has 2 PCIe slots onboard (excluding the on-board > dual-port mezz card) so I think it would make a fine NSD server even > without the riser. > > -Aaron > > On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services) > wrote: > > Maybe its related to interrupt handlers somehow? You drive the load > up on one socket, you push all the interrupt handling to the other > socket where the fabric card is attached? > > > > Dunno ... (Though I am intrigued you use idataplex nodes as NSD > servers, I assume its some 2U gpu-tray riser one or something !) > > > > Simon > > ________________________________________ > > From: _gpfsug-discuss-bounces at spectrumscale.org_ > [_gpfsug-discuss-bounces at spectrumscale.org_ > ] on behalf of Aaron > Knister [_aaron.s.knister at nasa.gov_ ] > > Sent: 17 February 2017 15:52 > > To: gpfsug main discussion list > > Subject: [gpfsug-discuss] bizarre performance behavior > > > > This is a good one. I've got an NSD server with 4x 16GB fibre > > connections coming in and 1x FDR10 and 1x QDR connection going out to > > the clients. I was having a really hard time getting anything resembling > > sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for > > reads). The back-end is a DDN SFA12K and I *know* it can do better than > > that. > > > > I don't remember quite how I figured this out but simply by running > > "openssl speed -multi 16" on the nsd server to drive up the load I saw > > an almost 4x performance jump which is pretty much goes against every > > sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to > > quadruple your i/o performance"). > > > > This feels like some type of C-states frequency scaling shenanigans that > > I haven't quite ironed down yet. I booted the box with the following > > kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which > > didn't seem to make much of a difference. I also tried setting the > > frequency governer to userspace and setting the minimum frequency to > > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have > > to run something to drive up the CPU load and then performance improves. > > > > I'm wondering if this could be an issue with the C1E state? I'm curious > > if anyone has seen anything like this. The node is a dx360 M4 > > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM. > > > > -Aaron > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at _spectrumscale.org_ > > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at _spectrumscale.org_ > > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at _spectrumscale.org_ _ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 61023 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 85131 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 84819 bytes Desc: not available URL: From oehmes at gmail.com Thu May 4 14:28:20 2017 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 04 May 2017 13:28:20 +0000 Subject: [gpfsug-discuss] HAWC question In-Reply-To: References: Message-ID: well, it's a bit complicated which is why the message is there in the first place. reason is, there is no easy way to tell except by dumping the stripgroup on the filesystem manager and check what log group your particular node is assigned to and then check the size of the log group. as soon as the client node gets restarted it should in most cases pick up a new log group and that should be at the new size, but to be 100% sure we say all nodes need to be restarted. you need to also turn HAWC on as well, i assume you just left this out of the email , just changing log size doesn't turn it on :-) On Thu, May 4, 2017 at 6:15 AM Simon Thompson (IT Research Support) < S.J.Thompson at bham.ac.uk> wrote: > Hi, > > I have a question about HAWC, we are trying to enable this for our > OpenStack environment, system pool is on SSD already, so we try to change > the log file size with: > > mmchfs FSNAME -L 128M > > This says: > > mmchfs: Attention: You must restart the GPFS daemons before the new log > file > size takes effect. The GPFS daemons can be restarted one node at a time. > When the GPFS daemon is restarted on the last node in the cluster, the new > log size becomes effective. > > > We multi-cluster the file-system, so do we have to restart every node in > all clusters, or just in the storage cluster? > > And how do we tell once it has become active? > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu May 4 14:39:33 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 4 May 2017 13:39:33 +0000 Subject: [gpfsug-discuss] HAWC question In-Reply-To: References: , Message-ID: Which cluster though? The client and storage are separate clusters, so all the nodes on the remote cluster or storage cluster? Thanks Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of oehmes at gmail.com [oehmes at gmail.com] Sent: 04 May 2017 14:28 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] HAWC question well, it's a bit complicated which is why the message is there in the first place. reason is, there is no easy way to tell except by dumping the stripgroup on the filesystem manager and check what log group your particular node is assigned to and then check the size of the log group. as soon as the client node gets restarted it should in most cases pick up a new log group and that should be at the new size, but to be 100% sure we say all nodes need to be restarted. you need to also turn HAWC on as well, i assume you just left this out of the email , just changing log size doesn't turn it on :-) On Thu, May 4, 2017 at 6:15 AM Simon Thompson (IT Research Support) > wrote: Hi, I have a question about HAWC, we are trying to enable this for our OpenStack environment, system pool is on SSD already, so we try to change the log file size with: mmchfs FSNAME -L 128M This says: mmchfs: Attention: You must restart the GPFS daemons before the new log file size takes effect. The GPFS daemons can be restarted one node at a time. When the GPFS daemon is restarted on the last node in the cluster, the new log size becomes effective. We multi-cluster the file-system, so do we have to restart every node in all clusters, or just in the storage cluster? And how do we tell once it has become active? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Thu May 4 15:06:10 2017 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 04 May 2017 14:06:10 +0000 Subject: [gpfsug-discuss] HAWC question In-Reply-To: References: Message-ID: let me clarify and get back, i am not 100% sure on a cross cluster , i think the main point was that the FS manager for that fs should be reassigned (which could also happen via mmchmgr) and then the individual clients that mount that fs restarted , but i will double check and reply later . On Thu, May 4, 2017 at 6:39 AM Simon Thompson (IT Research Support) < S.J.Thompson at bham.ac.uk> wrote: > Which cluster though? The client and storage are separate clusters, so all > the nodes on the remote cluster or storage cluster? > > Thanks > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [ > gpfsug-discuss-bounces at spectrumscale.org] on behalf of oehmes at gmail.com [ > oehmes at gmail.com] > Sent: 04 May 2017 14:28 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] HAWC question > > well, it's a bit complicated which is why the message is there in the > first place. > > reason is, there is no easy way to tell except by dumping the stripgroup > on the filesystem manager and check what log group your particular node is > assigned to and then check the size of the log group. > > as soon as the client node gets restarted it should in most cases pick up > a new log group and that should be at the new size, but to be 100% sure we > say all nodes need to be restarted. > > you need to also turn HAWC on as well, i assume you just left this out of > the email , just changing log size doesn't turn it on :-) > > On Thu, May 4, 2017 at 6:15 AM Simon Thompson (IT Research Support) < > S.J.Thompson at bham.ac.uk> wrote: > Hi, > > I have a question about HAWC, we are trying to enable this for our > OpenStack environment, system pool is on SSD already, so we try to change > the log file size with: > > mmchfs FSNAME -L 128M > > This says: > > mmchfs: Attention: You must restart the GPFS daemons before the new log > file > size takes effect. The GPFS daemons can be restarted one node at a time. > When the GPFS daemon is restarted on the last node in the cluster, the new > log size becomes effective. > > > We multi-cluster the file-system, so do we have to restart every node in > all clusters, or just in the storage cluster? > > And how do we tell once it has become active? > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 16:24:41 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 15:24:41 +0000 Subject: [gpfsug-discuss] Well, this is the pits... Message-ID: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu May 4 16:34:34 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 4 May 2017 17:34:34 +0200 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 16:43:56 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 15:43:56 +0000 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> Message-ID: <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser > wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 4 16:45:53 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 4 May 2017 15:45:53 +0000 Subject: [gpfsug-discuss] Replace SSL cert in GUI - need guidance Message-ID: Hi all, I'm going through the steps outlines in the following article: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_managecertforgui.htm Will this work for 4.2.1 installations? Only because in step 5, "Generate a Java(tm) keystore file (.jks) by using the keytool. It is stored in the following directory:", the given directory - /opt/ibm/wlp/java/jre/bin - does not exist. Only the path upto and including wlp is on my GUI server. I can't imagine the instructions being so different between 4.2.1 and 4.2 but I've seen it happen.. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu May 4 16:54:50 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 4 May 2017 17:54:50 +0200 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 4 16:55:36 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 4 May 2017 15:55:36 +0000 Subject: [gpfsug-discuss] Replace SSL cert in GUI - need guidance In-Reply-To: References: Message-ID: Never mind - /usr/lpp/mmfs/java/jre/bin is where it's at. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 04 May 2017 16:46 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Replace SSL cert in GUI - need guidance Hi all, I'm going through the steps outlines in the following article: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_managecertforgui.htm Will this work for 4.2.1 installations? Only because in step 5, "Generate a Java(tm) keystore file (.jks) by using the keytool. It is stored in the following directory:", the given directory - /opt/ibm/wlp/java/jre/bin - does not exist. Only the path upto and including wlp is on my GUI server. I can't imagine the instructions being so different between 4.2.1 and 4.2 but I've seen it happen.. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 17:07:32 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 16:07:32 +0000 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> Message-ID: <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser > wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser > wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From salut4tions at gmail.com Thu May 4 17:11:53 2017 From: salut4tions at gmail.com (Jordan Robertson) Date: Thu, 4 May 2017 12:11:53 -0400 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: Kevin, The math currently used in the code appears to be "greater than 31 NSD's in the filesystem" combined with "greater than 31 pit worker threads", explicitly for a balancing restripe (we actually hit that combo on an older version of 3.5.x before the safety got written in there...it was a long day). At least, that's the apparent math used through 4.1.1.10, which we're currently running. If pitWorkerThreadsPerNode is set to 0 (default), GPFS should set the active thread number equal to the number of cores in the node, to a max of 16 threads I believe. Take in mind that for a restripe, it will also include the threads available on the fs manager. So if your fs manager and at least one helper node are both set to "0", and each contains at least 16 cores, the restripe "thread calculation" will exceed 31 threads so it won't run. We've had to tune our helper nodes to lower numbers (e.g a single helper node to 15 threads). Aaron please correct me if I'm braining that wrong anywhere. -Jordan On Thu, May 4, 2017 at 12:07 PM, Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu> wrote: > Hi Olaf, > > I didn?t touch pitWorkerThreadsPerNode ? it was already zero. > > I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or > 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes > this? With what I?m doing I need the ability to run mmrestripefs. > > It seems to me that mmrestripefs could check whether QOS is enabled ? > granted, it would have no way of knowing whether the values used actually > are reasonable or not ? but if QOS is enabled then ?trust? it to not > overrun the system. > > PMR time? Thanks.. > > Kevin > > On May 4, 2017, at 10:54 AM, Olaf Weiser wrote: > > HI Kevin, > the number of NSDs is more or less nonsense .. it is just the number of > nodes x PITWorker should not exceed to much the #mutex/FS block > did you adjust/tune the PitWorker ? ... > > so far as I know.. that the code checks the number of NSDs is already > considered as a defect and will be fixed / is already fixed ( I stepped > into it here as well) > > ps. QOS is the better approach to address this, but unfortunately.. not > everyone is using it by default... that's why I suspect , the development > decide to put in a check/limit here .. which in your case(with QOS) > would'nt needed > > > > > > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:44 PM > Subject: Re: [gpfsug-discuss] Well, this is the pits... > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi Olaf, > > Your explanation mostly makes sense, but... > > Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. > And this filesystem only has 32 disks, which I would imagine is not an > especially large number compared to what some people reading this e-mail > have in their filesystems. > > I thought that QOS (which I?m using) was what would keep an mmrestripefs > from overrunning the system ? QOS has worked extremely well for us - it?s > one of my favorite additions to GPFS. > > Kevin > > On May 4, 2017, at 10:34 AM, Olaf Weiser <*olaf.weiser at de.ibm.com* > > wrote: > > no.. it is just in the code, because we have to avoid to run out of mutexs > / block > > reduce the number of nodes -N down to 4 (2nodes is even more safer) ... > is the easiest way to solve it for now.... > > I've been told the real root cause will be fixed in one of the next ptfs > .. within this year .. > this warning messages itself should appear every time.. but unfortunately > someone coded, that it depends on the number of disks (NSDs).. that's why I > suspect you did'nt see it before > but the fact , that we have to make sure, not to overrun the system by > mmrestripe remains.. to please lower the -N number of nodes to 4 or better > 2 > > (even though we know.. than the mmrestripe will take longer) > > > From: "Buterbaugh, Kevin L" <*Kevin.Buterbaugh at Vanderbilt.Edu* > > > To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org* > > > Date: 05/04/2017 05:26 PM > Subject: [gpfsug-discuss] Well, this is the pits... > Sent by: *gpfsug-discuss-bounces at spectrumscale.org* > > ------------------------------ > > > > Hi All, > > Another one of those, ?I can open a PMR if I need to? type questions? > > We are in the process of combining two large GPFS filesystems into one new > filesystem (for various reasons I won?t get into here). Therefore, I?m > doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. > > Yesterday I did an ?mmrestripefs -r -N ? (after > suspending a disk, of course). Worked like it should. > > Today I did a ?mmrestripefs -b -P capacity -N servers>? and got: > > mmrestripefs: The total number of PIT worker threads of all participating > nodes has been exceeded to safely restripe the file system. The total > number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode > of the participating nodes, cannot exceed 31. Reissue the command with a > smaller set of participating nodes (-N option) and/or lower the > pitWorkerThreadsPerNode configure setting. By default the file system > manager node is counted as a participating node. > mmrestripefs: Command failed. Examine previous error messages to determine > cause. > > So there must be some difference in how the ?-r? and ?-b? options > calculate the number of PIT worker threads. I did an ?mmfsadm dump all | > grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem > manager node ? they all say the same thing: > > pitWorkerThreadsPerNode 0 > > Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > *Kevin.Buterbaugh at vanderbilt.edu* - > (615)875-9633 <(615)%20875-9633> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Thu May 4 17:49:20 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Thu, 4 May 2017 12:49:20 -0400 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu><982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: Hi, >>I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. GPFS version 4.2.3.0 (and above) fixes this issue and supports "sum of pitWorkerThreadsPerNode of the participating nodes (-N parameter to mmrestripefs)" to exceed 31. If you are using 4.2.2.3, then depending on "number of nodes participating in the mmrestripefs" then the GPFS config parameter "pitWorkerThreadsPerNode" need to be adjusted such that "sum of pitWorkerThreadsPerNode of the participating nodes <= 31". For example, if "number of nodes participating in the mmrestripefs" is 6 then adjust "mmchconfig pitWorkerThreadsPerNode=5 -N ". GPFS would need to be restarted for this parameter to take effect on the participating_nodes (verify with mmfsadm dump config | grep pitWorkerThreadsPerNode) Regards, -Kums From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 12:08 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 17:56:26 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 16:56:26 +0000 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: Hi Kums, Thanks for the info on the releases ? can you clarify about pitWorkerThreadsPerNode? As I said in my original post, on all 8 NSD servers and the filesystem manager it is set to zero. No matter how many times I add zero to zero I don?t get a value > 31! ;-) So I take it that zero has some sort of unspecified significance? Thanks? Kevin On May 4, 2017, at 11:49 AM, Kumaran Rajaram > wrote: Hi, >>I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. GPFS version 4.2.3.0 (and above) fixes this issue and supports "sum of pitWorkerThreadsPerNode of the participating nodes (-N parameter to mmrestripefs)" to exceed 31. If you are using 4.2.2.3, then depending on "number of nodes participating in the mmrestripefs" then the GPFS config parameter "pitWorkerThreadsPerNode" need to be adjusted such that "sum of pitWorkerThreadsPerNode of the participating nodes <= 31". For example, if "number of nodes participating in the mmrestripefs" is 6 then adjust "mmchconfig pitWorkerThreadsPerNode=5 -N ". GPFS would need to be restarted for this parameter to take effect on the participating_nodes (verify with mmfsadm dump config | grep pitWorkerThreadsPerNode) Regards, -Kums From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 12:08 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser > wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser > wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu May 4 18:15:16 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 May 2017 17:15:16 +0000 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu> <982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu> <27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: <8E68031C-8362-468B-873F-2B3D3B2A15B7@vanderbilt.edu> Hi Stephen, My apologies - Jordan?s response had been snagged by the University's SPAM filter (I went and checked and found it after receiving your e-mail)? Kevin On May 4, 2017, at 12:04 PM, Stephen Ulmer > wrote: Look at Jordan?s answer, he explains what significance 0 has. In short, GPFS will use one thread per core per server, so they could add to 31 quickly. ;) -- Stephen On May 4, 2017, at 12:56 PM, Buterbaugh, Kevin L > wrote: Hi Kums, Thanks for the info on the releases ? can you clarify about pitWorkerThreadsPerNode? As I said in my original post, on all 8 NSD servers and the filesystem manager it is set to zero. No matter how many times I add zero to zero I don?t get a value > 31! ;-) So I take it that zero has some sort of unspecified significance? Thanks? Kevin On May 4, 2017, at 11:49 AM, Kumaran Rajaram > wrote: Hi, >>I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. GPFS version 4.2.3.0 (and above) fixes this issue and supports "sum of pitWorkerThreadsPerNode of the participating nodes (-N parameter to mmrestripefs)" to exceed 31. If you are using 4.2.2.3, then depending on "number of nodes participating in the mmrestripefs" then the GPFS config parameter "pitWorkerThreadsPerNode" need to be adjusted such that "sum of pitWorkerThreadsPerNode of the participating nodes <= 31". For example, if "number of nodes participating in the mmrestripefs" is 6 then adjust "mmchconfig pitWorkerThreadsPerNode=5 -N ". GPFS would need to be restarted for this parameter to take effect on the participating_nodes (verify with mmfsadm dump config | grep pitWorkerThreadsPerNode) Regards, -Kums From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 12:08 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser > wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser > wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Thu May 4 18:20:41 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Thu, 4 May 2017 13:20:41 -0400 Subject: [gpfsug-discuss] Well, this is the pits... In-Reply-To: References: <1799E310-614B-4704-BB37-EE9DF3F079C2@vanderbilt.edu><982D4505-940B-47DA-9C1B-46A67EA81222@vanderbilt.edu><27A45E12-2B1A-4E4B-98B6-560979BFA135@vanderbilt.edu> Message-ID: >>Thanks for the info on the releases ? can you clarify about pitWorkerThreadsPerNode? pitWorkerThreadsPerNode -- Specifies how many threads do restripe, data movement, etc >>As I said in my original post, on all 8 NSD servers and the filesystem manager it is set to zero. No matter how many times I add zero to zero I don?t get a value > 31! ;-) So I take it that zero has some sort of unspecified significance? Thanks? Value of 0 just indicates pitWorkerThreadsPerNode takes internal_value based on GPFS setup and file-system configuration (which can be 16 or lower) based on the following formula. Default is pitWorkerThreadsPerNode = MIN(16, (numberOfDisks_in_filesystem * 4) / numberOfParticipatingNodes_in_mmrestripefs + 1) For example, if you have 64 x NSDs in your file-system and you are using 8 NSD servers in "mmrestripefs -N", then pitWorkerThreadsPerNode = MIN (16, (256/8)+1) resulting in pitWorkerThreadsPerNode to take value of 16 ( default 0 will result in 16 threads doing restripe per mmrestripefs participating Node). If you want 8 NSD servers (running 4.2.2.3) to participate in mmrestripefs operation then set "mmchconfig pitWorkerThreadsPerNode=3 -N <8_NSD_Servers>" such that (8 x 3) is less than 31. Regards, -Kums From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 12:57 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Kums, Thanks for the info on the releases ? can you clarify about pitWorkerThreadsPerNode? As I said in my original post, on all 8 NSD servers and the filesystem manager it is set to zero. No matter how many times I add zero to zero I don?t get a value > 31! ;-) So I take it that zero has some sort of unspecified significance? Thanks? Kevin On May 4, 2017, at 11:49 AM, Kumaran Rajaram wrote: Hi, >>I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. GPFS version 4.2.3.0 (and above) fixes this issue and supports "sum of pitWorkerThreadsPerNode of the participating nodes (-N parameter to mmrestripefs)" to exceed 31. If you are using 4.2.2.3, then depending on "number of nodes participating in the mmrestripefs" then the GPFS config parameter "pitWorkerThreadsPerNode" need to be adjusted such that "sum of pitWorkerThreadsPerNode of the participating nodes <= 31". For example, if "number of nodes participating in the mmrestripefs" is 6 then adjust "mmchconfig pitWorkerThreadsPerNode=5 -N ". GPFS would need to be restarted for this parameter to take effect on the participating_nodes (verify with mmfsadm dump config | grep pitWorkerThreadsPerNode) Regards, -Kums From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 12:08 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Olaf, I didn?t touch pitWorkerThreadsPerNode ? it was already zero. I?m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 and are gradually being upgraded). What version of GPFS fixes this? With what I?m doing I need the ability to run mmrestripefs. It seems to me that mmrestripefs could check whether QOS is enabled ? granted, it would have no way of knowing whether the values used actually are reasonable or not ? but if QOS is enabled then ?trust? it to not overrun the system. PMR time? Thanks.. Kevin On May 4, 2017, at 10:54 AM, Olaf Weiser wrote: HI Kevin, the number of NSDs is more or less nonsense .. it is just the number of nodes x PITWorker should not exceed to much the #mutex/FS block did you adjust/tune the PitWorker ? ... so far as I know.. that the code checks the number of NSDs is already considered as a defect and will be fixed / is already fixed ( I stepped into it here as well) ps. QOS is the better approach to address this, but unfortunately.. not everyone is using it by default... that's why I suspect , the development decide to put in a check/limit here .. which in your case(with QOS) would'nt needed From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 05:44 PM Subject: Re: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Olaf, Your explanation mostly makes sense, but... Failed with 4 nodes ? failed with 2 nodes ? not gonna try with 1 node. And this filesystem only has 32 disks, which I would imagine is not an especially large number compared to what some people reading this e-mail have in their filesystems. I thought that QOS (which I?m using) was what would keep an mmrestripefs from overrunning the system ? QOS has worked extremely well for us - it?s one of my favorite additions to GPFS. Kevin On May 4, 2017, at 10:34 AM, Olaf Weiser wrote: no.. it is just in the code, because we have to avoid to run out of mutexs / block reduce the number of nodes -N down to 4 (2nodes is even more safer) ... is the easiest way to solve it for now.... I've been told the real root cause will be fixed in one of the next ptfs .. within this year .. this warning messages itself should appear every time.. but unfortunately someone coded, that it depends on the number of disks (NSDs).. that's why I suspect you did'nt see it before but the fact , that we have to make sure, not to overrun the system by mmrestripe remains.. to please lower the -N number of nodes to 4 or better 2 (even though we know.. than the mmrestripe will take longer) From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 05/04/2017 05:26 PM Subject: [gpfsug-discuss] Well, this is the pits... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Another one of those, ?I can open a PMR if I need to? type questions? We are in the process of combining two large GPFS filesystems into one new filesystem (for various reasons I won?t get into here). Therefore, I?m doing a lot of mmrestripe?s, mmdeldisk?s, and mmadddisk?s. Yesterday I did an ?mmrestripefs -r -N ? (after suspending a disk, of course). Worked like it should. Today I did a ?mmrestripefs -b -P capacity -N ? and got: mmrestripefs: The total number of PIT worker threads of all participating nodes has been exceeded to safely restripe the file system. The total number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode of the participating nodes, cannot exceed 31. Reissue the command with a smaller set of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode configure setting. By default the file system manager node is counted as a participating node. mmrestripefs: Command failed. Examine previous error messages to determine cause. So there must be some difference in how the ?-r? and ?-b? options calculate the number of PIT worker threads. I did an ?mmfsadm dump all | grep pitWorkerThreadsPerNode? on all 8 NSD servers and the filesystem manager node ? they all say the same thing: pitWorkerThreadsPerNode 0 Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!? I?m confused... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kums at us.ibm.com Thu May 4 23:22:12 2017 From: kums at us.ibm.com (Kumaran Rajaram) Date: Thu, 4 May 2017 22:22:12 +0000 Subject: [gpfsug-discuss] bizarre performance behavior In-Reply-To: References: <2a946193-259f-9dcb-0381-12fd571c5413@nasa.gov><4896c9cd-16d0-234d-b867-4787b41910cd@ugent.be><67E31108-39CE-4F37-8EF4-F0B548A4735C@nasa.gov><9dbcde5d-c7b1-717c-f7b9-a5b9665cfa98@ugent.be><7f7349c9-bdd3-5847-1cca-d98d221489fe@ugent.be> Message-ID: Hi, >>So the problem was some bad ib routing. We changed some ib links, and then we got also 12GB/s read with nsdperf. >>On our clients we then are able to achieve the 7,2GB/s in total we also saw using the nsd servers! This is good to hear. >> We are now running some tests with different blocksizes and parameters, because our backend storage is able to do more than the 7.2GB/s we get with GPFS (more like 14GB/s in total). I guess prefetchthreads and nsdworkerthreads are the ones to look at? If you are on 4.2.0.3 or higher, you can use workerThreads config paramter (start with value of 128, and increase in increments of 128 until MAX supported) and this setting will auto adjust values for other parameters such as prefetchThreads, worker3Threads etc. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Tuning%20Parameters In addition to trying larger file-system block-size (e.g. 4MiB or higher such that is aligns with storage volume RAID-stripe-width) and config parameters (e.g , workerThreads, ignorePrefetchLUNCount), it will be good to assess the "backend storage" performance for random I/O access pattern (with block I/O sizes in units of FS block-size) as this is more likely I/O scenario that the backend storage will experience when you have many GPFS nodes performing I/O simultaneously to the file-system (in production environment). mmcrfs has option "[-j {cluster | scatter}]". "-j scatter" would be recommended for consistent file-system performance over the lifetime of the file-system but then "-j scatter" will result in random I/O to backend storage (even though application is performing sequential I/O). For your test purposes, you may assess the GPFS file-system performance by mmcrfs with "-j cluster" and you may see good sequential results (compared to -j scatter) for lower client counts but as you scale the client counts the combined workload can result in "-j scatter" to backend storage (limiting the FS performance to random I/O performance of the backend storage). [snip from mmcrfs] layoutMap={scatter | cluster} Specifies the block allocation map type. When allocating blocks for a given file, GPFS first uses a round?robin algorithm to spread the data across all disks in the storage pool. After a disk is selected, the location of the data block on the disk is determined by the block allocation map type. If cluster is specified, GPFS attempts to allocate blocks in clusters. Blocks that belong to a particular file are kept adjacent to each other within each cluster. If scatter is specified, the location of the block is chosen randomly. The cluster allocation method may provide better disk performance for some disk subsystems in relatively small installations. The benefits of clustered block allocation diminish when the number of nodes in the cluster or the number of disks in a file system increases, or when the file system?s free space becomes fragmented. The cluster allocation method is the default for GPFS clusters with eight or fewer nodes and for file systems with eight or fewer disks. The scatter allocation method provides more consistent file system performance by averaging out performance variations due to block location (for many disk subsystems, the location of the data relative to the disk edge has a substantial effect on performance). This allocation method is appropriate in most cases and is the default for GPFS clusters with more than eight nodes or file systems with more than eight disks. The block allocation map type cannot be changed after the storage pool has been created. .. .. -j {cluster | scatter} Specifies the default block allocation map type to be used if layoutMap is not specified for a given storage pool. [/snip from mmcrfs] My two cents, -Kums From: Kenneth Waegeman To: gpfsug main discussion list Date: 05/04/2017 09:23 AM Subject: Re: [gpfsug-discuss] bizarre performance behavior Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We found out using ib_read_bw and ib_write_bw that there were some links between server and clients degraded, having a bandwith of 350MB/s strangely, nsdperf did not report the same. It reported 12GB/s write and 9GB/s read, which was much more then we actually could achieve. So the problem was some bad ib routing. We changed some ib links, and then we got also 12GB/s read with nsdperf. On our clients we then are able to achieve the 7,2GB/s in total we also saw using the nsd servers! Many thanks for the help !! We are now running some tests with different blocksizes and parameters, because our backend storage is able to do more than the 7.2GB/s we get with GPFS (more like 14GB/s in total). I guess prefetchthreads and nsdworkerthreads are the ones to look at? Cheers! Kenneth On 21/04/17 22:27, Kumaran Rajaram wrote: Hi Kenneth, As it was mentioned earlier, it will be good to first verify the raw network performance between the NSD client and NSD server using the nsdperf tool that is built with RDMA support. g++ -O2 -DRDMA -o nsdperf -lpthread -lrt -libverbs -lrdmacm nsdperf.C In addition, since you have 2 x NSD servers it will be good to perform NSD client file-system performance test with just single NSD server (mmshutdown the other server, assuming all the NSDs have primary, server NSD server configured + Quorum will be intact when a NSD server is brought down) to see if it helps to improve the read performance + if there are variations in the file-system read bandwidth results between NSD_server#1 'active' vs. NSD_server #2 'active' (with other NSD server in GPFS "down" state). If there is significant variation, it can help to isolate the issue to particular NSD server (HW or IB issue?). You can issue "mmdiag --waiters" on NSD client as well as NSD servers during your dd test, to verify if there are unsual long GPFS waiters. In addition, you may issue Linux "perf top -z" command on the GPFS node to see if there is high CPU usage by any particular call/event (for e.g., If GPFS config parameter verbsRdmaMaxSendBytes has been set to low value from the default 16M, then it can cause RDMA completion threads to go CPU bound ). Please verify some performance scenarios detailed in Chapter 22 in Spectrum Scale Problem Determination Guide (link below). https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/pdf/scale_pdg.pdf?view=kc Thanks, -Kums From: Kenneth Waegeman To: gpfsug main discussion list Date: 04/21/2017 11:43 AM Subject: Re: [gpfsug-discuss] bizarre performance behavior Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We already verified this on our nsds: [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --QpiSpeed QpiSpeed=maxdatarate [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg --turbomode turbomode=enable [root at nsd00 ~]# /opt/dell/toolkit/bin/syscfg ?-SysProfile SysProfile=perfoptimized so sadly this is not the issue. Also the output of the verbs commands look ok, there are connections from the client to the nsds are there is data being read and writen. Thanks again! Kenneth On 21/04/17 16:01, Kumaran Rajaram wrote: Hi, Try enabling the following in the BIOS of the NSD servers (screen shots below) Turbo Mode - Enable QPI Link Frequency - Max Performance Operating Mode - Maximum Performance >>>>While we have even better performance with sequential reads on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server seems limited by 0,5GB/s) independent of the number of clients >>We are testing from 2 testing machines connected to the nsds with infiniband, verbs enabled. Also, It will be good to verify that all the GPFS nodes have Verbs RDMA started using "mmfsadm test verbs status" and that the NSD client-server communication from client to server during "dd" is actually using Verbs RDMA using "mmfsadm test verbs conn" command (on NSD client doing dd). If not, then GPFS might be using TCP/IP network over which the cluster is configured impacting performance (If this is the case, GPFS mmfs.log.latest for any Verbs RDMA related errors and resolve). Regards, -Kums From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" To: gpfsug main discussion list Date: 04/21/2017 09:11 AM Subject: Re: [gpfsug-discuss] bizarre performance behavior Sent by: gpfsug-discuss-bounces at spectrumscale.org Fantastic news! It might also be worth running "cpupower monitor" or "turbostat" on your NSD servers while you're running dd tests from the clients to see what CPU frequency your cores are actually running at. A typical NSD server workload (especially with IB verbs and for reads) can be pretty light on CPU which might not prompt your CPU crew governor to up the frequency (which can affect throughout). If your frequency scaling governor isn't kicking up the frequency of your CPUs I've seen that cause this behavior in my testing. -Aaron On April 21, 2017 at 05:43:40 EDT, Kenneth Waegeman wrote: Hi, We are running a test setup with 2 NSD Servers backed by 4 Dell Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of the 4 powervaults, nsd02 is primary serving LUNS of controller B. We are testing from 2 testing machines connected to the nsds with infiniband, verbs enabled. When we do dd from the NSD servers, we see indeed performance going to 5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is able to get the data at a decent speed. Since we can write from the clients at a good speed, I didn't suspect the communication between clients and nsds being the issue, especially since total performance stays the same using 1 or multiple clients. I'll use the nsdperf tool to see if we can find anything, thanks! K On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: Interesting. Could you share a little more about your architecture? Is it possible to mount the fs on an NSD server and do some dd's from the fs on the NSD server? If that gives you decent performance perhaps try NSDPERF next https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf -Aaron On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman wrote: Hi, Having an issue that looks the same as this one: We can do sequential writes to the filesystem at 7,8 GB/s total , which is the expected speed for our current storage backend. While we have even better performance with sequential reads on raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd server seems limited by 0,5GB/s) independent of the number of clients (1,2,4,..) or ways we tested (fio,dd). We played with blockdev params, MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in this thread, but nothing seems to impact this read performance. Any ideas? Thanks! Kenneth On 17/02/17 19:29, Jan-Frode Myklebust wrote: I just had a similar experience from a sandisk infiniflash system SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes. and 250-300 Mbyte/s on sequential reads!! Random reads were on the order of 2 Gbyte/s. After a bit head scratching snd fumbling around I found out that reducing maxMBpS from 10000 to 100 fixed the problem! Digging further I found that reducing prefetchThreads from default=72 to 32 also fixed it, while leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s. Could something like this be the problem on your box as well? -jf fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister : Well, I'm somewhat scrounging for hardware. This is in our test environment :) And yep, it's got the 2U gpu-tray in it although even without the riser it has 2 PCIe slots onboard (excluding the on-board dual-port mezz card) so I think it would make a fine NSD server even without the riser. -Aaron On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services) wrote: > Maybe its related to interrupt handlers somehow? You drive the load up on one socket, you push all the interrupt handling to the other socket where the fabric card is attached? > > Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers, I assume its some 2U gpu-tray riser one or something !) > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org[ gpfsug-discuss-bounces at spectrumscale.org] on behalf of Aaron Knister [ aaron.s.knister at nasa.gov] > Sent: 17 February 2017 15:52 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] bizarre performance behavior > > This is a good one. I've got an NSD server with 4x 16GB fibre > connections coming in and 1x FDR10 and 1x QDR connection going out to > the clients. I was having a really hard time getting anything resembling > sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for > reads). The back-end is a DDN SFA12K and I *know* it can do better than > that. > > I don't remember quite how I figured this out but simply by running > "openssl speed -multi 16" on the nsd server to drive up the load I saw > an almost 4x performance jump which is pretty much goes against every > sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to > quadruple your i/o performance"). > > This feels like some type of C-states frequency scaling shenanigans that > I haven't quite ironed down yet. I booted the box with the following > kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which > didn't seem to make much of a difference. I also tried setting the > frequency governer to userspace and setting the minimum frequency to > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have > to run something to drive up the CPU load and then performance improves. > > I'm wondering if this could be an issue with the C1E state? I'm curious > if anyone has seen anything like this. The node is a dx360 M4 > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 61023 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 85131 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 84819 bytes Desc: not available URL: From ckrafft at de.ibm.com Fri May 5 18:13:18 2017 From: ckrafft at de.ibm.com (Christoph Krafft) Date: Fri, 5 May 2017 19:13:18 +0200 Subject: [gpfsug-discuss] Node Failure with SCSI-3 Persistant Reserve Message-ID: Hello folks, has anyone made "posotive" experiences with SCSI-3 Pers. Reserve? Is this "method" still valid for Linux? Thank you for any hints and tips! Mit freundlichen Gr??en / Sincerely Christoph Krafft Client Technical Specialist - Power Systems, IBM Systems Certified IT Specialist @ The Open Group Phone: +49 (0) 7034 643 2171 IBM Deutschland GmbH Mobile: +49 (0) 160 97 81 86 12 Am Weiher 24 Email: ckrafft at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Nicole Reimer, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Stefan Lutz Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19235477.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From scale at us.ibm.com Fri May 5 20:18:12 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 5 May 2017 15:18:12 -0400 Subject: [gpfsug-discuss] Node Failure with SCSI-3 Persistant Reserve In-Reply-To: References: Message-ID: SCSI-3 persistent reserve is still supported as documented in the FAQ. I personally do not have any experience using it. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Christoph Krafft" To: "gpfsug main discussion list" Date: 05/05/2017 01:14 PM Subject: [gpfsug-discuss] Node Failure with SCSI-3 Persistant Reserve Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello folks, has anyone made "posotive" experiences with SCSI-3 Pers. Reserve? Is this "method" still valid for Linux? Thank you for any hints and tips! Mit freundlichen Gr??en / Sincerely Christoph Krafft Client Technical Specialist - Power Systems, IBM Systems Certified IT Specialist @ The Open Group Phone: +49 (0) 7034 643 2171 IBM Deutschland GmbH Mobile: +49 (0) 160 97 81 86 12 Am Weiher 24 Email: ckrafft at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Nicole Reimer, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Stefan Lutz Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Mon May 8 17:06:22 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 12:06:22 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable Message-ID: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> We have a setup in which "cluster 0" is made up of clients only on gpfs v3.5, ie, no NDS's or formal storage on this primary membership. All storage for those clients come in a multi-cluster fashion, from clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). We recently added a new storage cluster 4 (4.1.1-14), and for some obscure reason we keep getting "Network is unreachable" during mount by clients, even though there were no issues or errors with the multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' worked fine, and all clients have an entry in /etc/fstab for the file system associated with the new cluster 4. The weird thing is that we can mount cluster 3 fine (also 4.1). Another piece og information is that as far as GPFS goes all clusters are configured to communicate exclusively over Infiniband, each on a different 10.20.x.x network, but broadcast 10.20.255.255. As far as the IB network goes there are no problems routing/pinging around all the clusters. So this must be internal to GPFS. None of the clusters have the subnet parameter set explicitly at configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem we need to. All have cipherList AUTHONLY. One difference is that cluster 4 has DMAPI enabled (don't think it matters). Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients during mount (10.20.179.1 is one of the NDS on cluster 4): Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node 10.20.179.1 failed with error 447 (client side). Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount wosgpfs.wos-gateway01-ib0:wosgpfs Mon May 8 11:35:28.783 2017: Network is unreachable I see this reference to "TLS handshake" and error 447, however according to the manual this TLS is only set to be default on 4.2 onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. mmdiag --network for some of the client gives this excerpt (broken status): tapenode-ib0 10.20.83.5 broken 233 -1 0 0 Linux/L gpc-f114n014-ib0 10.20.114.14 broken 233 -1 0 0 Linux/L gpc-f114n015-ib0 10.20.114.15 broken 233 -1 0 0 Linux/L gpc-f114n016-ib0 10.20.114.16 broken 233 -1 0 0 Linux/L wos-gateway01-ib0 10.20.179.1 broken 233 -1 0 0 Linux/L I guess I just need a hint on how to troubleshoot this situation (the 4.1 troubleshoot guide is not helping). Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From S.J.Thompson at bham.ac.uk Mon May 8 17:12:35 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Mon, 8 May 2017 16:12:35 +0000 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: Do you have multiple networks on the hosts? We've seen this sort of thing when rp_filter is dropping traffic with asynchronous routing. I know you said it's set to only go over IB, but if you have names that resolve onto you Ethernet, and admin name etc are not correct, it might be your problem. If you had 4.2, I'd suggest mmnetverify. I suppose that might work if you copied it out of the 4.x packages anyway? Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of pinto at scinet.utoronto.ca [pinto at scinet.utoronto.ca] Sent: 08 May 2017 17:06 To: gpfsug main discussion list Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable We have a setup in which "cluster 0" is made up of clients only on gpfs v3.5, ie, no NDS's or formal storage on this primary membership. All storage for those clients come in a multi-cluster fashion, from clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). We recently added a new storage cluster 4 (4.1.1-14), and for some obscure reason we keep getting "Network is unreachable" during mount by clients, even though there were no issues or errors with the multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' worked fine, and all clients have an entry in /etc/fstab for the file system associated with the new cluster 4. The weird thing is that we can mount cluster 3 fine (also 4.1). Another piece og information is that as far as GPFS goes all clusters are configured to communicate exclusively over Infiniband, each on a different 10.20.x.x network, but broadcast 10.20.255.255. As far as the IB network goes there are no problems routing/pinging around all the clusters. So this must be internal to GPFS. None of the clusters have the subnet parameter set explicitly at configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem we need to. All have cipherList AUTHONLY. One difference is that cluster 4 has DMAPI enabled (don't think it matters). Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients during mount (10.20.179.1 is one of the NDS on cluster 4): Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node 10.20.179.1 failed with error 447 (client side). Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount wosgpfs.wos-gateway01-ib0:wosgpfs Mon May 8 11:35:28.783 2017: Network is unreachable I see this reference to "TLS handshake" and error 447, however according to the manual this TLS is only set to be default on 4.2 onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. mmdiag --network for some of the client gives this excerpt (broken status): tapenode-ib0 10.20.83.5 broken 233 -1 0 0 Linux/L gpc-f114n014-ib0 10.20.114.14 broken 233 -1 0 0 Linux/L gpc-f114n015-ib0 10.20.114.15 broken 233 -1 0 0 Linux/L gpc-f114n016-ib0 10.20.114.16 broken 233 -1 0 0 Linux/L wos-gateway01-ib0 10.20.179.1 broken 233 -1 0 0 Linux/L I guess I just need a hint on how to troubleshoot this situation (the 4.1 troubleshoot guide is not helping). Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From pinto at scinet.utoronto.ca Mon May 8 17:23:01 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 12:23:01 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <20170508122301.25824jjpcvgd20dh@support.scinet.utoronto.ca> Quoting "Simon Thompson (IT Research Support)" : > Do you have multiple networks on the hosts? We've seen this sort of > thing when rp_filter is dropping traffic with asynchronous routing. > Yes Simon, All clients and servers have multiple interfaces on different networks, but we've been careful to always join nodes with the -ib0 resolution, always on IB. I can also query with 'mmlscluster' and all nodes involved are listed with the 10.20.x.x IP and -ib0 extension on their names. We don't have mmnetverify anywhere yet. Thanks Jaime > I know you said it's set to only go over IB, but if you have names > that resolve onto you Ethernet, and admin name etc are not correct, > it might be your problem. > > If you had 4.2, I'd suggest mmnetverify. I suppose that might work > if you copied it out of the 4.x packages anyway? > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of > pinto at scinet.utoronto.ca [pinto at scinet.utoronto.ca] > Sent: 08 May 2017 17:06 > To: gpfsug main discussion list > Subject: [gpfsug-discuss] help with multi-cluster setup: Network is > unreachable > > We have a setup in which "cluster 0" is made up of clients only on > gpfs v3.5, ie, no NDS's or formal storage on this primary membership. > > All storage for those clients come in a multi-cluster fashion, from > clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). > > We recently added a new storage cluster 4 (4.1.1-14), and for some > obscure reason we keep getting "Network is unreachable" during mount > by clients, even though there were no issues or errors with the > multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' > worked fine, and all clients have an entry in /etc/fstab for the file > system associated with the new cluster 4. The weird thing is that we > can mount cluster 3 fine (also 4.1). > > Another piece og information is that as far as GPFS goes all clusters > are configured to communicate exclusively over Infiniband, each on a > different 10.20.x.x network, but broadcast 10.20.255.255. As far as > the IB network goes there are no problems routing/pinging around all > the clusters. So this must be internal to GPFS. > > None of the clusters have the subnet parameter set explicitly at > configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem > we need to. All have cipherList AUTHONLY. One difference is that > cluster 4 has DMAPI enabled (don't think it matters). > > Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients > during mount (10.20.179.1 is one of the NDS on cluster 4): > Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster > wosgpfs.wos-gateway01-ib0 > Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node > 10.20.179.1 failed with error 447 (client side). > Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster > wosgpfs.wos-gateway01-ib0 > Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount > wosgpfs.wos-gateway01-ib0:wosgpfs > Mon May 8 11:35:28.783 2017: Network is unreachable > > > I see this reference to "TLS handshake" and error 447, however > according to the manual this TLS is only set to be default on 4.2 > onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. > > mmdiag --network for some of the client gives this excerpt (broken status): > tapenode-ib0 10.20.83.5 > broken 233 -1 0 0 Linux/L > gpc-f114n014-ib0 10.20.114.14 > broken 233 -1 0 0 Linux/L > gpc-f114n015-ib0 10.20.114.15 > broken 233 -1 0 0 Linux/L > gpc-f114n016-ib0 10.20.114.16 > broken 233 -1 0 0 Linux/L > wos-gateway01-ib0 10.20.179.1 > broken 233 -1 0 0 Linux/L > > > > I guess I just need a hint on how to troubleshoot this situation (the > 4.1 troubleshoot guide is not helping). > > Thanks > Jaime > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From eric.wonderley at vt.edu Mon May 8 17:34:44 2017 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Mon, 8 May 2017 12:34:44 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: Hi Jamie: I think typically you want to keep the clients ahead of the server in version. I would advance the version of you client nodes. New clients can communicate with older versions of server nsds. Vice versa...no so much. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon May 8 17:49:52 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 8 May 2017 16:49:52 +0000 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <54F54813-F40D-4F55-B774-272813758EC8@vanderbilt.edu> Hi Eric, Jamie, Interesting comment as we do exactly the opposite! I always make sure that my servers are running a particular version before I upgrade any clients. Now we never mix and match major versions (i.e. 4.x and 3.x) for long ? those kinds of upgrades we do rapidly. But right now I?ve got clients running 4.2.0-3 talking just fine to 4.2.2.3 servers. To be clear, I?m not saying I?m right and Eric?s wrong at all - just an observation / data point. YMMV? Kevin On May 8, 2017, at 11:34 AM, J. Eric Wonderley > wrote: Hi Jamie: I think typically you want to keep the clients ahead of the server in version. I would advance the version of you client nodes. New clients can communicate with older versions of server nsds. Vice versa...no so much. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon May 8 18:04:22 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 13:04:22 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <20170508130422.11171a2pqcx35p1y@support.scinet.utoronto.ca> Sorry, I made a mistake on the original description: all our clients are already on 4.1.1-7. Jaime Quoting "J. Eric Wonderley" : > Hi Jamie: > > I think typically you want to keep the clients ahead of the server in > version. I would advance the version of you client nodes. > > New clients can communicate with older versions of server nsds. Vice > versa...no so much. > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From mweil at wustl.edu Mon May 8 18:07:03 2017 From: mweil at wustl.edu (Matt Weil) Date: Mon, 8 May 2017 12:07:03 -0500 Subject: [gpfsug-discuss] socketMaxListenConnections and net.core.somaxconn Message-ID: <39b63a8b-2ae7-f9a0-c1c4-319f84fa5354@wustl.edu> Hello all, what happens if we set socketMaxListenConnections to a larger number than we have clients? more memory used? Thanks Matt ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From pinto at scinet.utoronto.ca Mon May 8 18:12:38 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 13:12:38 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <54F54813-F40D-4F55-B774-272813758EC8@vanderbilt.edu> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> <54F54813-F40D-4F55-B774-272813758EC8@vanderbilt.edu> Message-ID: <20170508131238.632312ooano92cxy@support.scinet.utoronto.ca> I only ask that we look beyond the trivial. The existing multi-cluster setup with mixed versions of servers already work fine with 4000+ clients on 4.1. We still have 3 legacy servers on 3.5, we already have a server on 4.1 also serving fine. The brand new 4.1 server we added last week seems to be at odds for some reason, not that obvious. Thanks Jaime Quoting "Buterbaugh, Kevin L" : > Hi Eric, Jamie, > > Interesting comment as we do exactly the opposite! > > I always make sure that my servers are running a particular version > before I upgrade any clients. Now we never mix and match major > versions (i.e. 4.x and 3.x) for long ? those kinds of upgrades we do > rapidly. But right now I?ve got clients running 4.2.0-3 talking > just fine to 4.2.2.3 servers. > > To be clear, I?m not saying I?m right and Eric?s wrong at all - just > an observation / data point. YMMV? > > Kevin > > On May 8, 2017, at 11:34 AM, J. Eric Wonderley > > wrote: > > Hi Jamie: > > I think typically you want to keep the clients ahead of the server > in version. I would advance the version of you client nodes. > > New clients can communicate with older versions of server nsds. > Vice versa...no so much. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From valdis.kletnieks at vt.edu Mon May 8 20:48:19 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 08 May 2017 15:48:19 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <13767.1494272899@turing-police.cc.vt.edu> On Mon, 08 May 2017 12:06:22 -0400, "Jaime Pinto" said: > Another piece og information is that as far as GPFS goes all clusters > are configured to communicate exclusively over Infiniband, each on a > different 10.20.x.x network, but broadcast 10.20.255.255. As far as Have you verified that broadcast setting actually works, and packets aren't being discarded as martians? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Mon May 8 21:06:28 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 08 May 2017 16:06:28 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable In-Reply-To: <13767.1494272899@turing-police.cc.vt.edu> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> <13767.1494272899@turing-police.cc.vt.edu> Message-ID: <20170508160628.20766ng8x98ogjpg@support.scinet.utoronto.ca> Quoting valdis.kletnieks at vt.edu: > On Mon, 08 May 2017 12:06:22 -0400, "Jaime Pinto" said: > >> Another piece og information is that as far as GPFS goes all clusters >> are configured to communicate exclusively over Infiniband, each on a >> different 10.20.x.x network, but broadcast 10.20.255.255. As far as > > Have you verified that broadcast setting actually works, and packets > aren't being discarded as martians? > Yes, we have. They are fine. I'm seeing "failure to join the cluster" messages prior to the "network unreachable" in the mmfslog files, so I'm starting to suspect minor disparities between older releases of 3.5.x.x at one end and newer 4.1.x.x at the other. I'll dig a little more and report the findings. Thanks Jaime ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From UWEFALKE at de.ibm.com Tue May 9 08:16:23 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 9 May 2017 09:16:23 +0200 Subject: [gpfsug-discuss] help with multi-cluster setup: Network isunreachable In-Reply-To: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: Hi, Jaime, I'd suggest you trace a client while trying to connect and check what addresses it is going to talk to actually. It is a bit tedious, but you will be able to find this in the trace report file. You might also get an idea what's going wrong... Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Jaime Pinto" To: "gpfsug main discussion list" Date: 05/08/2017 06:06 PM Subject: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable Sent by: gpfsug-discuss-bounces at spectrumscale.org We have a setup in which "cluster 0" is made up of clients only on gpfs v3.5, ie, no NDS's or formal storage on this primary membership. All storage for those clients come in a multi-cluster fashion, from clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). We recently added a new storage cluster 4 (4.1.1-14), and for some obscure reason we keep getting "Network is unreachable" during mount by clients, even though there were no issues or errors with the multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' worked fine, and all clients have an entry in /etc/fstab for the file system associated with the new cluster 4. The weird thing is that we can mount cluster 3 fine (also 4.1). Another piece og information is that as far as GPFS goes all clusters are configured to communicate exclusively over Infiniband, each on a different 10.20.x.x network, but broadcast 10.20.255.255. As far as the IB network goes there are no problems routing/pinging around all the clusters. So this must be internal to GPFS. None of the clusters have the subnet parameter set explicitly at configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem we need to. All have cipherList AUTHONLY. One difference is that cluster 4 has DMAPI enabled (don't think it matters). Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients during mount (10.20.179.1 is one of the NDS on cluster 4): Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node 10.20.179.1 failed with error 447 (client side). Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount wosgpfs.wos-gateway01-ib0:wosgpfs Mon May 8 11:35:28.783 2017: Network is unreachable I see this reference to "TLS handshake" and error 447, however according to the manual this TLS is only set to be default on 4.2 onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. mmdiag --network for some of the client gives this excerpt (broken status): tapenode-ib0 10.20.83.5 broken 233 -1 0 0 Linux/L gpc-f114n014-ib0 10.20.114.14 broken 233 -1 0 0 Linux/L gpc-f114n015-ib0 10.20.114.15 broken 233 -1 0 0 Linux/L gpc-f114n016-ib0 10.20.114.16 broken 233 -1 0 0 Linux/L wos-gateway01-ib0 10.20.179.1 broken 233 -1 0 0 Linux/L I guess I just need a hint on how to troubleshoot this situation (the 4.1 troubleshoot guide is not helping). Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Mark.Bush at siriuscom.com Tue May 9 17:25:00 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Tue, 9 May 2017 16:25:00 +0000 Subject: [gpfsug-discuss] CES and Directory list populating very slowly Message-ID: <9A893CC0-1816-4EDE-A06E-8491DF5509D4@siriuscom.com> I have a customer who is struggling (they already have a PMR open and it?s being actively worked on now). I?m simply seeking understanding of potential places to look. They have an ESS with a few CES nodes in front. Clients connect via SMB to the CES nodes. One fileset has about 300k smallish files in it and when the client opens a windows browser it takes around 30mins to finish populating the files in this SMB share. Here?s where my confusion is. When a client connects to a CES node this is all the job of the CES and it?s protocol services to handle, so in this case CTDB/Samba. But the flow of this is where maybe I?m a little fuzzy. Obviously the CES nodes act as clients to the NSD (IO/nodes in ESS land) servers. So, the data really doesn?t exist on the protocol node but passes things off to the NSD server for regular IO processing. Does the CES node do some type of caching? I?ve heard talk of LROC on CES nodes potentially but I?m curious if all of this is already being stored in the pagepool? What could cause a mostly metadata related simple directory lookup take what seems to the customer a long time for a couple hundred thousand files? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Tue May 9 18:00:22 2017 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 9 May 2017 10:00:22 -0700 Subject: [gpfsug-discuss] CES and Directory list populating very slowly In-Reply-To: <9A893CC0-1816-4EDE-A06E-8491DF5509D4@siriuscom.com> References: <9A893CC0-1816-4EDE-A06E-8491DF5509D4@siriuscom.com> Message-ID: ESS nodes have cache, but what matters most for this type of workloads is to have a very large metadata cache, this resides on the CES node for SMB/NFS workloads. so if you know that your client will use this 300k directory a lot you want to have a very large maxfilestocache setting on this nodes. alternative solution is to install a LROC device and configure a larger statcache, this helps especially if you have multiple larger directories and want to cache as many as possible from all of them. make sure you have enough tokenmanager and memory on them if you have multiple CES nodes and they all will have high settings. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Mark Bush To: gpfsug main discussion list Date: 05/09/2017 05:25 PM Subject: [gpfsug-discuss] CES and Directory list populating very slowly Sent by: gpfsug-discuss-bounces at spectrumscale.org I have a customer who is struggling (they already have a PMR open and it?s being actively worked on now). I?m simply seeking understanding of potential places to look. They have an ESS with a few CES nodes in front. Clients connect via SMB to the CES nodes. One fileset has about 300k smallish files in it and when the client opens a windows browser it takes around 30mins to finish populating the files in this SMB share. Here?s where my confusion is. When a client connects to a CES node this is all the job of the CES and it?s protocol services to handle, so in this case CTDB/Samba. But the flow of this is where maybe I?m a little fuzzy. Obviously the CES nodes act as clients to the NSD (IO/nodes in ESS land) servers. So, the data really doesn?t exist on the protocol node but passes things off to the NSD server for regular IO processing. Does the CES node do some type of caching? I?ve heard talk of LROC on CES nodes potentially but I?m curious if all of this is already being stored in the pagepool? What could cause a mostly metadata related simple directory lookup take what seems to the customer a long time for a couple hundred thousand files? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Tue May 9 19:58:22 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 9 May 2017 14:58:22 -0400 Subject: [gpfsug-discuss] CES and Directory list populating very slowly In-Reply-To: References: <9A893CC0-1816-4EDE-A06E-8491DF5509D4@siriuscom.com> Message-ID: If you haven't already, measure the time directly on the CES node command line skipping Windows and Samba overheads: time ls -l /path or time ls -lR /path Depending which you're interested in. From: "Sven Oehme" To: gpfsug main discussion list Date: 05/09/2017 01:01 PM Subject: Re: [gpfsug-discuss] CES and Directory list populating very slowly Sent by: gpfsug-discuss-bounces at spectrumscale.org ESS nodes have cache, but what matters most for this type of workloads is to have a very large metadata cache, this resides on the CES node for SMB/NFS workloads. so if you know that your client will use this 300k directory a lot you want to have a very large maxfilestocache setting on this nodes. alternative solution is to install a LROC device and configure a larger statcache, this helps especially if you have multiple larger directories and want to cache as many as possible from all of them. make sure you have enough tokenmanager and memory on them if you have multiple CES nodes and they all will have high settings. sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ Mark Bush ---05/09/2017 05:25:39 PM---I have a customer who is struggling (they already have a PMR open and it?s being actively worked on From: Mark Bush To: gpfsug main discussion list Date: 05/09/2017 05:25 PM Subject: [gpfsug-discuss] CES and Directory list populating very slowly Sent by: gpfsug-discuss-bounces at spectrumscale.org I have a customer who is struggling (they already have a PMR open and it?s being actively worked on now). I?m simply seeking understanding of potential places to look. They have an ESS with a few CES nodes in front. Clients connect via SMB to the CES nodes. One fileset has about 300k smallish files in it and when the client opens a windows browser it takes around 30mins to finish populating the files in this SMB share. Here?s where my confusion is. When a client connects to a CES node this is all the job of the CES and it?s protocol services to handle, so in this case CTDB/Samba. But the flow of this is where maybe I?m a little fuzzy. Obviously the CES nodes act as clients to the NSD (IO/nodes in ESS land) servers. So, the data really doesn?t exist on the protocol node but passes things off to the NSD server for regular IO processing. Does the CES node do some type of caching? I?ve heard talk of LROC on CES nodes potentially but I?m curious if all of this is already being stored in the pagepool? What could cause a mostly metadata related simple directory lookup take what seems to the customer a long time for a couple hundred thousand files? Mark This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 105 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed May 10 02:26:19 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Tue, 09 May 2017 21:26:19 -0400 Subject: [gpfsug-discuss] help with multi-cluster setup: Network isunreachable In-Reply-To: References: <20170508120622.353560406vfoux6m@support.scinet.utoronto.ca> Message-ID: <20170509212619.88345qjpf9ea46kb@support.scinet.utoronto.ca> As it turned out, the 'authorized_keys' file placed in the /var/mmfs/ssl directory of the NDS for the new storage cluster 4 (4.1.1-14) needed an explicit entry of the following format for the bracket associated with clients on cluster 0: nistCompliance=off Apparently the default for 4.1.x is: nistCompliance=SP800-131A I just noticed that on cluster 3 (4.1.1-7) that entry is also present for the bracket associated with clients cluster 0. I guess the Seagate fellows that helped us install the G200 in our facility had that figured out. The original "TLS handshake" error message kind of gave me a hint of the problem, however the 4.1 installation manual specifically mentioned that this could be an issue only on 4.2 onward. The troubleshoot guide for 4.2 has this excerpt: "Ensure that the configurations of GPFS and the remote key management (RKM) server are compatible when it comes to the version of the TLS protocol used upon key retrieval (GPFS uses the nistCompliance configuration variable to control that). In particular, if nistCompliance=SP800-131A is set in GPFS, ensure that the TLS v1.2 protocol is enabled in the RKM server. If this does not resolve the issue, contact the IBM Support Center.". So, how am I to know that nistCompliance=off is even an option? For backward compatibility with the older storage clusters on 3.5 the clients cluster need to have nistCompliance=off I hope this helps the fellows in mixed versions environments, since it's not obvious from the 3.5/4.1 installation manuals or the troubleshoots guide what we should do. Thanks everyone for the help. Jaime Quoting "Uwe Falke" : > Hi, Jaime, > I'd suggest you trace a client while trying to connect and check what > addresses it is going to talk to actually. It is a bit tedious, but you > will be able to find this in the trace report file. You might also get an > idea what's going wrong... > > > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Andreas Hasse, Thomas Wolter > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > Date: 05/08/2017 06:06 PM > Subject: [gpfsug-discuss] help with multi-cluster setup: Network is > unreachable > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > We have a setup in which "cluster 0" is made up of clients only on > gpfs v4.1, ie, no NDS's or formal storage on this primary membership. > > All storage for those clients come in a multi-cluster fashion, from > clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). > > We recently added a new storage cluster 4 (4.1.1-14), and for some > obscure reason we keep getting "Network is unreachable" during mount > by clients, even though there were no issues or errors with the > multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' > worked fine, and all clients have an entry in /etc/fstab for the file > system associated with the new cluster 4. The weird thing is that we > can mount cluster 3 fine (also 4.1). > > Another piece og information is that as far as GPFS goes all clusters > are configured to communicate exclusively over Infiniband, each on a > different 10.20.x.x network, but broadcast 10.20.255.255. As far as > the IB network goes there are no problems routing/pinging around all > the clusters. So this must be internal to GPFS. > > None of the clusters have the subnet parameter set explicitly at > configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem > we need to. All have cipherList AUTHONLY. One difference is that > cluster 4 has DMAPI enabled (don't think it matters). > > Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients > during mount (10.20.179.1 is one of the NDS on cluster 4): > Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster > wosgpfs.wos-gateway01-ib0 > Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node > 10.20.179.1 failed with error 447 (client side). > Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster > wosgpfs.wos-gateway01-ib0 > Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount > wosgpfs.wos-gateway01-ib0:wosgpfs > Mon May 8 11:35:28.783 2017: Network is unreachable > > > I see this reference to "TLS handshake" and error 447, however > according to the manual this TLS is only set to be default on 4.2 > onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY. > > mmdiag --network for some of the client gives this excerpt (broken > status): > tapenode-ib0 10.20.83.5 > broken 233 -1 0 0 Linux/L > gpc-f114n014-ib0 10.20.114.14 > broken 233 -1 0 0 Linux/L > gpc-f114n015-ib0 10.20.114.15 > broken 233 -1 0 0 Linux/L > gpc-f114n016-ib0 10.20.114.16 > broken 233 -1 0 0 Linux/L > wos-gateway01-ib0 10.20.179.1 > broken 233 -1 0 0 Linux/L > > > > I guess I just need a hint on how to troubleshoot this situation (the > 4.1 troubleshoot guide is not helping). > > Thanks > Jaime > > > > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Robert.Oesterlin at nuance.com Wed May 10 15:13:56 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 10 May 2017 14:13:56 +0000 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error Message-ID: <1AA3F1C6-8677-460F-B61D-FE87DDC5D8AB@nuance.com> I could not find any way to find out what the issue is here - ideas? [root]# mmhealth cluster show nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. I?ve tried it multiple times, always returns this error. I recently switched the cluster over to 4.2.2 Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed May 10 16:46:21 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 10 May 2017 11:46:21 -0400 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error In-Reply-To: <1AA3F1C6-8677-460F-B61D-FE87DDC5D8AB@nuance.com> References: <1AA3F1C6-8677-460F-B61D-FE87DDC5D8AB@nuance.com> Message-ID: <3939.1494431181@turing-police.cc.vt.edu> On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said: > [root]# mmhealth cluster show > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. Does 'mmlsmgr' return something sane? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed May 10 16:52:35 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 10 May 2017 15:52:35 +0000 Subject: [gpfsug-discuss] patched rsync question Message-ID: <27CCB813-DF05-49A6-A510-51499DFF4B85@vanderbilt.edu> Hi All, We are using the patched version of rsync: rsync version 3.0.9 protocol version 30 Copyright (C) 1996-2011 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, gpfs, iconv, symtimes rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details. to copy files from our old GPFS filesystem to our new GPFS filesystem. Unfortunately, for one group I inadvertently left off the ?-A? option when rsync?ing them, so it didn?t preserve their ACL?s. The original files were deleted, but we were able to restore them from a backup taken on April 25th. I looked, but cannot find any option to rsync that would only update based on ACL?s / permissions. Out of 13,000+ files, it appears that 910 have been modified in the interim. So what I am thinking of doing is rerunning the rsync from the restore directory to the new filesystem directory with the -A option. I?ll test this with ??dry-run? first, of course. I am thinking that this will update the ACL?s on all but the 910 modified files, which would then have to be dealt with on a case by case basis. Anyone have any comments on this idea or any better ideas? Thanks! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed May 10 17:20:39 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 10 May 2017 16:20:39 +0000 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error Message-ID: <4CE7EC70-E144-4B93-9747-49C422787785@nuance.com> Yea, it?s fine. I did manage to get it to respond after I did a ?mmsysmoncontrol restart? but it?s still not showing proper status across the cluster. Seems a bit fragile :-) Bob Oesterlin Sr Principal Storage Engineer, Nuance On 5/10/17, 10:46 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu" wrote: On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said: > [root]# mmhealth cluster show > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. Does 'mmlsmgr' return something sane? From Kevin.Buterbaugh at Vanderbilt.Edu Wed May 10 18:57:11 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 10 May 2017 17:57:11 +0000 Subject: [gpfsug-discuss] patched rsync question In-Reply-To: References: <27CCB813-DF05-49A6-A510-51499DFF4B85@vanderbilt.edu> Message-ID: Hi Stephen, Thanks for the suggestion. We thought about doing something similar to this but in the end I just ran a: rsync -aAvu /old/location /new/location And that seems to have updated the ACL?s on everything except the 910 modified files, which we?re dealing with in a manner similar to what you suggest below. Thanks all? Kevin On May 10, 2017, at 12:51 PM, Stephen Ulmer > wrote: If there?s only 13K files, and you don?t want to copy them, why use rsync at all? I think your solution is: * check every restored for for an ACL * copy the ACL to the same file in the new file system What about generating a file list and then just traversing it dumping the ACL from the restored file and adding it to the new file (after transforming the path). You could probably do the dump/assign with a pipe and not even write the ACLs down. You can even multi-thread the process if you have GNU xargs. Something like (untested): xargs -P num_cores_or_something ./helper_script.sh < list_of_files Where helper_script.sh is (also untested): NEWPATH=$( echo $1 | sed -e ?s/remove/replace/' ) getfacl $1 | setfacl $NEWPATH -- Stephen On May 10, 2017, at 11:52 AM, Buterbaugh, Kevin L > wrote: Hi All, We are using the patched version of rsync: rsync version 3.0.9 protocol version 30 Copyright (C) 1996-2011 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, gpfs, iconv, symtimes rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details. to copy files from our old GPFS filesystem to our new GPFS filesystem. Unfortunately, for one group I inadvertently left off the ?-A? option when rsync?ing them, so it didn?t preserve their ACL?s. The original files were deleted, but we were able to restore them from a backup taken on April 25th. I looked, but cannot find any option to rsync that would only update based on ACL?s / permissions. Out of 13,000+ files, it appears that 910 have been modified in the interim. So what I am thinking of doing is rerunning the rsync from the restore directory to the new filesystem directory with the -A option. I?ll test this with ??dry-run? first, of course. I am thinking that this will update the ACL?s on all but the 910 modified files, which would then have to be dealt with on a case by case basis. Anyone have any comments on this idea or any better ideas? Thanks! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From usa-principal at gpfsug.org Wed May 10 21:01:05 2017 From: usa-principal at gpfsug.org (usa-principal-gpfsug.org) Date: Wed, 10 May 2017 13:01:05 -0700 Subject: [gpfsug-discuss] Presentations Uploaded - SSUG Event @NERSC April 4-5 Message-ID: <7501c112d2e6ff79f9c89907a292ddab@webmail.gpfsug.org> All, I have just updated the Presentations page with 19 talks from the US SSUG event last month. The videos should be available on YouTube soon. I'll announce that separately. https://www.spectrumscale.org/presentations/ Cheers, Kristy From Anna.Wagner at de.ibm.com Thu May 11 12:28:22 2017 From: Anna.Wagner at de.ibm.com (Anna Christina Wagner) Date: Thu, 11 May 2017 13:28:22 +0200 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error In-Reply-To: <4CE7EC70-E144-4B93-9747-49C422787785@nuance.com> References: <4CE7EC70-E144-4B93-9747-49C422787785@nuance.com> Message-ID: Hello Bob, 4.2.2 is the release were we introduced "mmhealth cluster show". And you are totally right, it can be a little fragile at times. So a short explanation: We had this situation on test machines as well. Because of issues with the system not only the mm-commands but also usual Linux commands took more than 10 seconds to return. We have internally a default time out of 10 seconds for cli commands. So if you had a failover situation, in which the cluster manager was changed (we have our cluster state manager (CSM) on the cluster manager) and the mmlsmgr command did not return in 10 seconds the node does not know, that it is the CSM and will not start the corresponding service for that. If you want me to look further into it or if you have feedback regarding mmhealth please feel free to send me an email (Anna.Wagner at de.ibm.com) Mit freundlichen Gr??en / Kind regards Wagner, Anna Christina Software Engineer, Spectrum Scale Development IBM Systems IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 10.05.2017 18:21 Subject: Re: [gpfsug-discuss] "mmhealth cluster show" returns error Sent by: gpfsug-discuss-bounces at spectrumscale.org Yea, it?s fine. I did manage to get it to respond after I did a ?mmsysmoncontrol restart? but it?s still not showing proper status across the cluster. Seems a bit fragile :-) Bob Oesterlin Sr Principal Storage Engineer, Nuance On 5/10/17, 10:46 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu" wrote: On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said: > [root]# mmhealth cluster show > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. Does 'mmlsmgr' return something sane? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu May 11 13:05:14 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Thu, 11 May 2017 08:05:14 -0400 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error In-Reply-To: References: <4CE7EC70-E144-4B93-9747-49C422787785@nuance.com> Message-ID: I?ve also been exploring the mmhealth and gpfsgui for the first time this week. I have a test cluster where I?m trying the new stuff. Running 4.2.2-2 mmhealth cluster show says everyone is in nominal status: Component Total Failed Degraded Healthy Other ------------------------------------------------------------------------------------- NODE 12 0 0 12 0 GPFS 12 0 0 12 0 NETWORK 12 0 0 12 0 FILESYSTEM 0 0 0 0 0 DISK 0 0 0 0 0 GUI 1 0 0 1 0 PERFMON 12 0 0 12 0 However on the GUI there is conflicting information: 1) Home page shows 3/8 NSD Servers unhealthy 2) Home page shows 3/21 Nodes unhealthy ? where is it getting this notion? ? there are only 12 nodes in the whole cluster! 3) clicking on either NSD Servers or Nodes leads to the monitoring page where the top half spins forever, bottom half is content-free. I may have installed the pmsensors RPM on a couple of other nodes back in early April, but have forgotten which ones. They are in the production cluster. Also, the storage in this sandbox cluster has not been turned into a filesystem yet. There are a few dozen free NSDs. Perhaps the ?FILESYSTEM CHECKING? status is somehow wedging up the GUI? Node name: storage005.oscar.ccv.brown.edu Node status: HEALTHY Status Change: 15 hours ago Component Status Status Change Reasons ------------------------------------------------------ GPFS HEALTHY 16 hours ago - NETWORK HEALTHY 16 hours ago - FILESYSTEM CHECKING 16 hours ago - GUI HEALTHY 15 hours ago - PERFMON HEALTHY 16 hours ago I?ve tried restarting the GUI service and also rebooted the GUI server, but it comes back looking the same. Any thoughts? > On May 11, 2017, at 7:28 AM, Anna Christina Wagner wrote: > > Hello Bob, > > 4.2.2 is the release were we introduced "mmhealth cluster show". And you are totally right, it can be a little fragile at times. > > So a short explanation: > We had this situation on test machines as well. Because of issues with the system not only the mm-commands but also usual Linux commands > took more than 10 seconds to return. We have internally a default time out of 10 seconds for cli commands. So if you had a failover situation, in which the cluster manager > was changed (we have our cluster state manager (CSM) on the cluster manager) and the mmlsmgr command did not return in 10 seconds the node does not > know, that it is the CSM and will not start the corresponding service for that. > > > If you want me to look further into it or if you have feedback regarding mmhealth please feel free to send me an email (Anna.Wagner at de.ibm.com) > > Mit freundlichen Gr??en / Kind regards > > Wagner, Anna Christina > > Software Engineer, Spectrum Scale Development > IBM Systems > > IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz > Gesch?ftsf?hrung: Dirk Wittkopp > Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 > > > > From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 10.05.2017 18:21 > Subject: Re: [gpfsug-discuss] "mmhealth cluster show" returns error > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Yea, it?s fine. > > I did manage to get it to respond after I did a ?mmsysmoncontrol restart? but it?s still not showing proper status across the cluster. > > Seems a bit fragile :-) > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > > On 5/10/17, 10:46 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu" wrote: > > On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said: > > > [root]# mmhealth cluster show > > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state manager. It may be in an failover process. Please try again in a few seconds. > > Does 'mmlsmgr' return something sane? > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu May 11 13:36:47 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 11 May 2017 12:36:47 +0000 Subject: [gpfsug-discuss] "mmhealth cluster show" returns error Message-ID: <9C601DFD-16FF-40E7-8D46-16033C443428@nuance.com> Thanks Anna, I will email you directly. Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 From: on behalf of Anna Christina Wagner Reply-To: gpfsug main discussion list Date: Thursday, May 11, 2017 at 6:28 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] "mmhealth cluster show" returns error Hello Bob, 4.2.2 is the release were we introduced "mmhealth cluster show". And you are totally right, it can be a little fragile at times. So a short explanation: We had this situation on test machines as well. Because of issues with the system not only the mm-commands but also usual Linux commands took more than 10 seconds to return. We have internally a default time out of 10 seconds for cli commands. So if you had a failover situation, in which the cluster manager was changed (we have our cluster state manager (CSM) on the cluster manager) and the mmlsmgr command did not return in 10 seconds the node does not know, that it is the CSM and will not start the corresponding service for that. If you want me to look further into it or if you have feedback regarding mmhealth please feel free to send me an email (Anna.Wagner at de.ibm.com) Mit freundlichen Gr??en / Kind regards Wagner, Anna Christina Software Engineer, Spectrum Scale Development IBM Systems IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Christian.Fey at sva.de Thu May 11 16:37:43 2017 From: Christian.Fey at sva.de (Fey, Christian) Date: Thu, 11 May 2017 15:37:43 +0000 Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Message-ID: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The old ctdb cluster uses rid as idmap backend. Since CES officially supports only autorid, I tried to choose the right values for the idmap ranges / sizes to get the same IDs but was not successful with this. The old samba has a range assigned for their Active directory (idmap config XYZ : range = 1000000-1999999) My idea was to set autorid to the following during mmuserauth create: idmap config * : backend = autorid idmap config * : range = 200000-2999999 idmap config * : rangesize = 800000 With that it should use the first range for the builtin range and the second should then start with 1000000 like in the old rid config. Sadly, the range of the domain is the third one: /usr/lpp/mmfs/bin/net idmap get ranges RANGE 0: ALLOC RANGE 1: S-1-5-32 RANGE 2: S-1-5-21-123456789-123456789-123456789 Does anyone have an idea how to fix this, maybe in a supported way and without storing the IDs in the domain? Further on, does anyone use rid as backend, even if not officially supported? Maybe we could file a RPQ or sth. Like this. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5467 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Thu May 11 18:49:02 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 11 May 2017 17:49:02 +0000 Subject: [gpfsug-discuss] Edge case failure mode Message-ID: Just following up on some discussions we had at the UG this week. I mentioned a few weeks back that we were having issues with failover of NFS, and we figured a work around to our clients for this so that failover works great now (plus there is some code fixes coming down the line as well to help). Here's my story of fun with protocol nodes ... Since then we've occasionally been seeing the load average of 1 CES node rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients. A lot of digging and we found that CTDB was reporting > 80% memory used, so we tweaked the page pool down to solve this. Great we thought ... But alas that wasn't the cause. Just to be clear 95% of the time, the CES node is fine, I can do and ls in the mounted file-systems and all is good. When the load rises to 400, an ls takes 20-30 seconds, so they are related, but what is the initial cause? Other CES nodes are 100% fine and if we do mmces node suspend, and then resume all is well on the node (and no other CES node assumes the problem as the IP moves). Its not always the same CES IP, node or even data centre, and most of the time is looks fine. I logged a ticket with OCF today, and one thing they suggested was to disable NFSv3 as they've seen similar behaviour at another site. As far as I know, all my NFS clients are v4, but sure we disable v3 anyway as its not actually needed. (Both at the ganesha layer, change the default for exports and reconfigure all existing exports to v4 only for good measure). That didn't help, but certainly worth a try! Note that my CES cluster is multi-cluster mounting the file-systems and from the POSIX side, its fine most of the time. We've used the mmnetverify command to check that all is well as well. Of course this only checks the local cluster, not remote nodes, but as we aren't seeing expels and can access the FS, we assume that the GPFS layer is working fine. So we finally log a PMR with IBM, I catch a node in a broken state and pull a trace from it and upload that, and ask what other traces they might want (apparently there is no protocol trace for NFS in 4.2.2-3). Now, when we run this, I note that its doing things like mmlsfileset to the remote storage, coming from two clusters and some of this is timing out. We've already had issues with rp_filter on remote nodes causing expels, but the storage backend here has only 1 nic, and we can mount and access it all fine. So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not GPFS ping of course), but not make "admin" calls to it. Ssh appears to work fine as well BTW to it. So I check on my CES and this is multi-homed and rp_filter is enabled. Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm sure I'm an edge case, but it would be REALLY REALLY helpful to get mmnetverify to work across a cluster (e.g. I say this is a remote node and here's its FQDN, can you talk to it) which would have helped with diagnosis here. I'm not entirely sure why ssh etc would work and pass rp_filter, but not GPFS traffic (in some cases apparently), but I guess its something to do with how GPFS is binding and then the kernel routing layer. I'm still not sure if this is my root cause as the occurrences of the high load are a bit random (anything from every hour to being stable for 2-3 days), but since making the rp_filter change this afternoon, so far ...? I've created an RFE for mmnetverify to be able to test across a cluster... https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503 0 Simon From bbanister at jumptrading.com Thu May 11 18:58:18 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 11 May 2017 17:58:18 +0000 Subject: [gpfsug-discuss] Edge case failure mode In-Reply-To: References: Message-ID: <87b204b6e245439bb475792cf3672aa5@jumptrading.com> Hey Simon, I clicked your link but I think it went to a page that is not about this RFE: [cid:image001.png at 01D2CA56.41F66270] Cheers, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, May 11, 2017 12:49 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Edge case failure mode Just following up on some discussions we had at the UG this week. I mentioned a few weeks back that we were having issues with failover of NFS, and we figured a work around to our clients for this so that failover works great now (plus there is some code fixes coming down the line as well to help). Here's my story of fun with protocol nodes ... Since then we've occasionally been seeing the load average of 1 CES node rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients. A lot of digging and we found that CTDB was reporting > 80% memory used, so we tweaked the page pool down to solve this. Great we thought ... But alas that wasn't the cause. Just to be clear 95% of the time, the CES node is fine, I can do and ls in the mounted file-systems and all is good. When the load rises to 400, an ls takes 20-30 seconds, so they are related, but what is the initial cause? Other CES nodes are 100% fine and if we do mmces node suspend, and then resume all is well on the node (and no other CES node assumes the problem as the IP moves). Its not always the same CES IP, node or even data centre, and most of the time is looks fine. I logged a ticket with OCF today, and one thing they suggested was to disable NFSv3 as they've seen similar behaviour at another site. As far as I know, all my NFS clients are v4, but sure we disable v3 anyway as its not actually needed. (Both at the ganesha layer, change the default for exports and reconfigure all existing exports to v4 only for good measure). That didn't help, but certainly worth a try! Note that my CES cluster is multi-cluster mounting the file-systems and from the POSIX side, its fine most of the time. We've used the mmnetverify command to check that all is well as well. Of course this only checks the local cluster, not remote nodes, but as we aren't seeing expels and can access the FS, we assume that the GPFS layer is working fine. So we finally log a PMR with IBM, I catch a node in a broken state and pull a trace from it and upload that, and ask what other traces they might want (apparently there is no protocol trace for NFS in 4.2.2-3). Now, when we run this, I note that its doing things like mmlsfileset to the remote storage, coming from two clusters and some of this is timing out. We've already had issues with rp_filter on remote nodes causing expels, but the storage backend here has only 1 nic, and we can mount and access it all fine. So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not GPFS ping of course), but not make "admin" calls to it. Ssh appears to work fine as well BTW to it. So I check on my CES and this is multi-homed and rp_filter is enabled. Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm sure I'm an edge case, but it would be REALLY REALLY helpful to get mmnetverify to work across a cluster (e.g. I say this is a remote node and here's its FQDN, can you talk to it) which would have helped with diagnosis here. I'm not entirely sure why ssh etc would work and pass rp_filter, but not GPFS traffic (in some cases apparently), but I guess its something to do with how GPFS is binding and then the kernel routing layer. I'm still not sure if this is my root cause as the occurrences of the high load are a bit random (anything from every hour to being stable for 2-3 days), but since making the rp_filter change this afternoon, so far ...? I've created an RFE for mmnetverify to be able to test across a cluster... https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503 0 Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 36506 bytes Desc: image001.png URL: From S.J.Thompson at bham.ac.uk Thu May 11 19:05:08 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 11 May 2017 18:05:08 +0000 Subject: [gpfsug-discuss] Edge case failure mode Message-ID: Cheers Bryan ... http://goo.gl/YXitIF Points to: (Outlook/mailing list is line breaking and cutting the trailing 0) https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=105030 Simon From: > on behalf of "bbanister at jumptrading.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 11 May 2017 at 18:58 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] Edge case failure mode Hey Simon, I clicked your link but I think it went to a page that is not about this RFE: [cid:image001.png at 01D2CA56.41F66270] Cheers, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, May 11, 2017 12:49 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Edge case failure mode Just following up on some discussions we had at the UG this week. I mentioned a few weeks back that we were having issues with failover of NFS, and we figured a work around to our clients for this so that failover works great now (plus there is some code fixes coming down the line as well to help). Here's my story of fun with protocol nodes ... Since then we've occasionally been seeing the load average of 1 CES node rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients. A lot of digging and we found that CTDB was reporting > 80% memory used, so we tweaked the page pool down to solve this. Great we thought ... But alas that wasn't the cause. Just to be clear 95% of the time, the CES node is fine, I can do and ls in the mounted file-systems and all is good. When the load rises to 400, an ls takes 20-30 seconds, so they are related, but what is the initial cause? Other CES nodes are 100% fine and if we do mmces node suspend, and then resume all is well on the node (and no other CES node assumes the problem as the IP moves). Its not always the same CES IP, node or even data centre, and most of the time is looks fine. I logged a ticket with OCF today, and one thing they suggested was to disable NFSv3 as they've seen similar behaviour at another site. As far as I know, all my NFS clients are v4, but sure we disable v3 anyway as its not actually needed. (Both at the ganesha layer, change the default for exports and reconfigure all existing exports to v4 only for good measure). That didn't help, but certainly worth a try! Note that my CES cluster is multi-cluster mounting the file-systems and from the POSIX side, its fine most of the time. We've used the mmnetverify command to check that all is well as well. Of course this only checks the local cluster, not remote nodes, but as we aren't seeing expels and can access the FS, we assume that the GPFS layer is working fine. So we finally log a PMR with IBM, I catch a node in a broken state and pull a trace from it and upload that, and ask what other traces they might want (apparently there is no protocol trace for NFS in 4.2.2-3). Now, when we run this, I note that its doing things like mmlsfileset to the remote storage, coming from two clusters and some of this is timing out. We've already had issues with rp_filter on remote nodes causing expels, but the storage backend here has only 1 nic, and we can mount and access it all fine. So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not GPFS ping of course), but not make "admin" calls to it. Ssh appears to work fine as well BTW to it. So I check on my CES and this is multi-homed and rp_filter is enabled. Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm sure I'm an edge case, but it would be REALLY REALLY helpful to get mmnetverify to work across a cluster (e.g. I say this is a remote node and here's its FQDN, can you talk to it) which would have helped with diagnosis here. I'm not entirely sure why ssh etc would work and pass rp_filter, but not GPFS traffic (in some cases apparently), but I guess its something to do with how GPFS is binding and then the kernel routing layer. I'm still not sure if this is my root cause as the occurrences of the high load are a bit random (anything from every hour to being stable for 2-3 days), but since making the rp_filter change this afternoon, so far ...? I've created an RFE for mmnetverify to be able to test across a cluster... https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503 0 Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 36506 bytes Desc: image001.png URL: From pinto at scinet.utoronto.ca Thu May 11 19:17:06 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 11 May 2017 14:17:06 -0400 Subject: [gpfsug-discuss] BIG LAG since 3.5 on quota accounting reconciliation In-Reply-To: <20170331101834.162212izrlj4swu2@support.scinet.utoronto.ca> References: <20170331101834.162212izrlj4swu2@support.scinet.utoronto.ca> Message-ID: <20170511141706.942536htkinc957m@support.scinet.utoronto.ca> Just bumping up. When I first posted this subject at the end of March there was a UG meeting that drove people's attention. I hope to get some comments now. Thanks Jaime Quoting "Jaime Pinto" : > In the old days of DDN 9900 and gpfs 3.4 I only had to run mmcheckquota > once a month, usually after the massive monthly purge. > > I noticed that starting with the GSS and ESS appliances under 3.5 that > I needed to run mmcheckquota more often, at least once a week, or as > often as daily, to clear the slippage errors in the accounting > information, otherwise users complained that they were hitting their > quotas, even throughout they deleted a lot of stuff. > > More recently we adopted a G200 appliance (1.8PB), with v4.1, and now > things have gotten worst, and I have to run it twice daily, just in > case. > > So, what I am missing? Is their a parameter since 3.5 and through 4.1 > that we can set, so that GPFS will reconcile the quota accounting > internally more often and on its own? > > Thanks > Jaime > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From bbanister at jumptrading.com Thu May 11 19:20:47 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 11 May 2017 18:20:47 +0000 Subject: [gpfsug-discuss] Edge case failure mode In-Reply-To: References: Message-ID: <607e7c81dd3349fd8c0a8602d1938e3b@jumptrading.com> I was wondering why that 0 was left on that line alone... hahaha, -B From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, May 11, 2017 1:05 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Edge case failure mode Cheers Bryan ... http://goo.gl/YXitIF Points to: (Outlook/mailing list is line breaking and cutting the trailing 0) https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=105030 Simon From: > on behalf of "bbanister at jumptrading.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 11 May 2017 at 18:58 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] Edge case failure mode Hey Simon, I clicked your link but I think it went to a page that is not about this RFE: [cid:image001.png at 01D2CA59.65CF7300] Cheers, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, May 11, 2017 12:49 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Edge case failure mode Just following up on some discussions we had at the UG this week. I mentioned a few weeks back that we were having issues with failover of NFS, and we figured a work around to our clients for this so that failover works great now (plus there is some code fixes coming down the line as well to help). Here's my story of fun with protocol nodes ... Since then we've occasionally been seeing the load average of 1 CES node rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients. A lot of digging and we found that CTDB was reporting > 80% memory used, so we tweaked the page pool down to solve this. Great we thought ... But alas that wasn't the cause. Just to be clear 95% of the time, the CES node is fine, I can do and ls in the mounted file-systems and all is good. When the load rises to 400, an ls takes 20-30 seconds, so they are related, but what is the initial cause? Other CES nodes are 100% fine and if we do mmces node suspend, and then resume all is well on the node (and no other CES node assumes the problem as the IP moves). Its not always the same CES IP, node or even data centre, and most of the time is looks fine. I logged a ticket with OCF today, and one thing they suggested was to disable NFSv3 as they've seen similar behaviour at another site. As far as I know, all my NFS clients are v4, but sure we disable v3 anyway as its not actually needed. (Both at the ganesha layer, change the default for exports and reconfigure all existing exports to v4 only for good measure). That didn't help, but certainly worth a try! Note that my CES cluster is multi-cluster mounting the file-systems and from the POSIX side, its fine most of the time. We've used the mmnetverify command to check that all is well as well. Of course this only checks the local cluster, not remote nodes, but as we aren't seeing expels and can access the FS, we assume that the GPFS layer is working fine. So we finally log a PMR with IBM, I catch a node in a broken state and pull a trace from it and upload that, and ask what other traces they might want (apparently there is no protocol trace for NFS in 4.2.2-3). Now, when we run this, I note that its doing things like mmlsfileset to the remote storage, coming from two clusters and some of this is timing out. We've already had issues with rp_filter on remote nodes causing expels, but the storage backend here has only 1 nic, and we can mount and access it all fine. So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not GPFS ping of course), but not make "admin" calls to it. Ssh appears to work fine as well BTW to it. So I check on my CES and this is multi-homed and rp_filter is enabled. Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm sure I'm an edge case, but it would be REALLY REALLY helpful to get mmnetverify to work across a cluster (e.g. I say this is a remote node and here's its FQDN, can you talk to it) which would have helped with diagnosis here. I'm not entirely sure why ssh etc would work and pass rp_filter, but not GPFS traffic (in some cases apparently), but I guess its something to do with how GPFS is binding and then the kernel routing layer. I'm still not sure if this is my root cause as the occurrences of the high load are a bit random (anything from every hour to being stable for 2-3 days), but since making the rp_filter change this afternoon, so far ...? I've created an RFE for mmnetverify to be able to test across a cluster... https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10503 0 Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 36506 bytes Desc: image001.png URL: From UWEFALKE at de.ibm.com Thu May 11 20:42:29 2017 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Thu, 11 May 2017 21:42:29 +0200 Subject: [gpfsug-discuss] BIG LAG since 3.5 on quotaaccountingreconciliation In-Reply-To: <20170511141706.942536htkinc957m@support.scinet.utoronto.ca> References: <20170331101834.162212izrlj4swu2@support.scinet.utoronto.ca> <20170511141706.942536htkinc957m@support.scinet.utoronto.ca> Message-ID: Hi, Jaimie, we got the same problem, also with a GSS although I suppose it's rather to do with the code above GNR, but who knows. I have a PMR open for quite some time (and had others as well). Seems like things improved by upgrading the FS version, but atre not gone. However, these issues are to be solved via PMRs. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Jaime Pinto" To: "gpfsug main discussion list" , "Jaime Pinto" Date: 05/11/2017 08:17 PM Subject: Re: [gpfsug-discuss] BIG LAG since 3.5 on quota accounting reconciliation Sent by: gpfsug-discuss-bounces at spectrumscale.org Just bumping up. When I first posted this subject at the end of March there was a UG meeting that drove people's attention. I hope to get some comments now. Thanks Jaime Quoting "Jaime Pinto" : > In the old days of DDN 9900 and gpfs 3.4 I only had to run mmcheckquota > once a month, usually after the massive monthly purge. > > I noticed that starting with the GSS and ESS appliances under 3.5 that > I needed to run mmcheckquota more often, at least once a week, or as > often as daily, to clear the slippage errors in the accounting > information, otherwise users complained that they were hitting their > quotas, even throughout they deleted a lot of stuff. > > More recently we adopted a G200 appliance (1.8PB), with v4.1, and now > things have gotten worst, and I have to run it twice daily, just in > case. > > So, what I am missing? Is their a parameter since 3.5 and through 4.1 > that we can set, so that GPFS will reconcile the quota accounting > internally more often and on its own? > > Thanks > Jaime > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From damir.krstic at gmail.com Fri May 12 11:42:19 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Fri, 12 May 2017 10:42:19 +0000 Subject: [gpfsug-discuss] connected v. datagram mode Message-ID: I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and are in datagram mode. In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. What is is the right thing to do? Thanks in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Fri May 12 12:43:01 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Fri, 12 May 2017 07:43:01 -0400 Subject: [gpfsug-discuss] BIG LAG since 3.5 on quotaaccountingreconciliation In-Reply-To: References: <20170331101834.162212izrlj4swu2@support.scinet.utoronto.ca> <20170511141706.942536htkinc957m@support.scinet.utoronto.ca> Message-ID: <20170512074301.91955kiad218rl51@support.scinet.utoronto.ca> I like to give the community a chance to reflect on the issue, check their own installations and possibly give us all some comments. If in a few more days we still don't get any hints I'll have to open a couple of support tickets (IBM, DDN, Seagate, ...). Cheers Jaime Quoting "Uwe Falke" : > Hi, Jaimie, > > we got the same problem, also with a GSS although I suppose it's rather to > do with the code above GNR, but who knows. > I have a PMR open for quite some time (and had others as well). > Seems like things improved by upgrading the FS version, but atre not gone. > > > However, these issues are to be solved via PMRs. > > Mit freundlichen Gr??en / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefalke at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: > Andreas Hasse, Thomas Wolter > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" , > "Jaime Pinto" > Date: 05/11/2017 08:17 PM > Subject: Re: [gpfsug-discuss] BIG LAG since 3.5 on quota accounting > reconciliation > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Just bumping up. > When I first posted this subject at the end of March there was a UG > meeting that drove people's attention. > > I hope to get some comments now. > > Thanks > Jaime > > Quoting "Jaime Pinto" : > >> In the old days of DDN 9900 and gpfs 3.4 I only had to run mmcheckquota >> once a month, usually after the massive monthly purge. >> >> I noticed that starting with the GSS and ESS appliances under 3.5 that >> I needed to run mmcheckquota more often, at least once a week, or as >> often as daily, to clear the slippage errors in the accounting >> information, otherwise users complained that they were hitting their >> quotas, even throughout they deleted a lot of stuff. >> >> More recently we adopted a G200 appliance (1.8PB), with v4.1, and now >> things have gotten worst, and I have to run it twice daily, just in >> case. >> >> So, what I am missing? Is their a parameter since 3.5 and through 4.1 >> that we can set, so that GPFS will reconcile the quota accounting >> internally more often and on its own? >> >> Thanks >> Jaime >> > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From jonathon.anderson at colorado.edu Fri May 12 15:43:55 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Fri, 12 May 2017 14:43:55 +0000 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: Message-ID: This won?t tell you which to use; but datagram mode and connected mode in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that there?s no checking/retry built into the protocol; the other is ?reliable? and detects whether data is received completely and in the correct order. The last advice I heard for traditional IB was that the overhead of connected mode isn?t worth it, particularly if you?re using IPoIB (where you?re likely to be using TCP anyway). That said, on our OPA network we?re seeing the opposite advice; so I, to, am often unsure what the most correct configuration would be for any given fabric. ~jonathon On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Damir Krstic" wrote: I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and are in datagram mode. In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. What is is the right thing to do? Thanks in advance. Damir From aaron.s.knister at nasa.gov Fri May 12 15:48:14 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 12 May 2017 10:48:14 -0400 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: Message-ID: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> For what it's worth we've seen *significantly* better performance of streaming benchmarks of IPoIB with connected mode vs datagram mode on IB. -Aaron On 5/12/17 10:43 AM, Jonathon A Anderson wrote: > This won?t tell you which to use; but datagram mode and connected mode in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that there?s no checking/retry built into the protocol; the other is ?reliable? and detects whether data is received completely and in the correct order. > > The last advice I heard for traditional IB was that the overhead of connected mode isn?t worth it, particularly if you?re using IPoIB (where you?re likely to be using TCP anyway). That said, on our OPA network we?re seeing the opposite advice; so I, to, am often unsure what the most correct configuration would be for any given fabric. > > ~jonathon > > > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Damir Krstic" wrote: > > I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and > are in datagram mode. > > > In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. > What is is the right thing to do? > > > Thanks in advance. > Damir > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From janfrode at tanso.net Fri May 12 16:03:03 2017 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Fri, 12 May 2017 15:03:03 +0000 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: I also don't know much about this, but the ESS quick deployment guide is quite clear on the we should use connected mode for IPoIB: -------------- Note: If using bonded IP over IB, do the following: Ensure that the CONNECTED_MODE=yes statement exists in the corresponding slave-bond interface scripts located in /etc/sysconfig/network-scripts directory of the management server and I/O server nodes. These scripts are created as part of the IP over IB bond creation. An example of the slave-bond interface with the modification is shown below. --------------- -jf fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister : > For what it's worth we've seen *significantly* better performance of > streaming benchmarks of IPoIB with connected mode vs datagram mode on IB. > > -Aaron > > On 5/12/17 10:43 AM, Jonathon A Anderson wrote: > > This won?t tell you which to use; but datagram mode and connected mode > in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that > there?s no checking/retry built into the protocol; the other is ?reliable? > and detects whether data is received completely and in the correct order. > > > > The last advice I heard for traditional IB was that the overhead of > connected mode isn?t worth it, particularly if you?re using IPoIB (where > you?re likely to be using TCP anyway). That said, on our OPA network we?re > seeing the opposite advice; so I, to, am often unsure what the most correct > configuration would be for any given fabric. > > > > ~jonathon > > > > > > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Damir Krstic" behalf of damir.krstic at gmail.com> wrote: > > > > I never fully understood the difference between connected v. > datagram mode beside the obvious packet size difference. Our NSD servers > (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our > 700+ clients are running RH6 and > > are in datagram mode. > > > > > > In a month we are upgrading our cluster to RedHat 7 and are debating > whether to leave the compute nodes in datagram mode or whether to switch > them to connected mode. > > What is is the right thing to do? > > > > > > Thanks in advance. > > Damir > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Fri May 12 16:05:47 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Fri, 12 May 2017 15:05:47 +0000 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: It may be true that you should always favor connected mode; but those instructions look like they?re specifically only talking about when you have bonded interfaces. ~jonathon On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jan-Frode Myklebust" wrote: I also don't know much about this, but the ESS quick deployment guide is quite clear on the we should use connected mode for IPoIB: -------------- Note: If using bonded IP over IB, do the following: Ensure that the CONNECTED_MODE=yes statement exists in the corresponding slave-bond interface scripts located in /etc/sysconfig/network-scripts directory of the management server and I/O server nodes. These scripts are created as part of the IP over IB bond creation. An example of the slave-bond interface with the modification is shown below. --------------- -jf fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister : For what it's worth we've seen *significantly* better performance of streaming benchmarks of IPoIB with connected mode vs datagram mode on IB. -Aaron On 5/12/17 10:43 AM, Jonathon A Anderson wrote: > This won?t tell you which to use; but datagram mode and connected mode in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that there?s no checking/retry built into the protocol; the other is ?reliable? and detects whether data is received completely and in the correct order. > > The last advice I heard for traditional IB was that the overhead of connected mode isn?t worth it, particularly if you?re using IPoIB (where you?re likely to be using TCP anyway). That said, on our OPA network we?re seeing the opposite advice; so I, to, am often unsure what the most correct configuration would be for any given fabric. > > ~jonathon > > > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Damir Krstic" wrote: > > I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and > are in datagram mode. > > > In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. > What is is the right thing to do? > > > Thanks in advance. > Damir > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From usa-principal at gpfsug.org Fri May 12 17:03:46 2017 From: usa-principal at gpfsug.org (usa-principal-gpfsug.org) Date: Fri, 12 May 2017 09:03:46 -0700 Subject: [gpfsug-discuss] YouTube Videos of Talks - April 4-5 US SSUG Meeting at NERSC Message-ID: All, The YouTube videos are now available on the Spectrum Scale/GPFS User Group channel, and will be on the IBM channel as well in the near term. https://www.youtube.com/playlist?list=PLrdepxEIEyCp1TqZ2z3WfGOgqO9oY01xY Cheers, Kristy From laurence at qsplace.co.uk Sat May 13 00:27:19 2017 From: laurence at qsplace.co.uk (Laurence Horrocks-Barlow) Date: Sat, 13 May 2017 00:27:19 +0100 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: It also depends on the adapter. We have seen better performance using datagram with MLNX adapters however we see better in connected mode when using Intel truescale. Again as Jonathon has mentioned we have also seen better performance when using connected mode on active/slave bonded interface (even between a mixed MLNX/TS fabric). There is also a significant difference in the MTU size you can use in datagram vs connected mode, with datagram being limited to 2044 (if memory serves) there as connected mode can use 65536 (again if memory serves). I typically now run qperf and nsdperf benchmarks to find the best configuration. -- Lauz On 12/05/2017 16:05, Jonathon A Anderson wrote: > It may be true that you should always favor connected mode; but those instructions look like they?re specifically only talking about when you have bonded interfaces. > > ~jonathon > > > On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jan-Frode Myklebust" wrote: > > > > > I also don't know much about this, but the ESS quick deployment guide is quite clear on the we should use connected mode for IPoIB: > > -------------- > Note: If using bonded IP over IB, do the following: Ensure that the CONNECTED_MODE=yes statement exists in the corresponding slave-bond interface scripts located in /etc/sysconfig/network-scripts directory of the management server and I/O server nodes. These > scripts are created as part of the IP over IB bond creation. An example of the slave-bond interface with the modification is shown below. > --------------- > > > -jf > fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister : > > > For what it's worth we've seen *significantly* better performance of > streaming benchmarks of IPoIB with connected mode vs datagram mode on IB. > > -Aaron > > On 5/12/17 10:43 AM, Jonathon A Anderson wrote: > > This won?t tell you which to use; but datagram mode and connected mode in IB is roughly analogous to UDB vs TCP in IP. One is ?unreliable? in that there?s no checking/retry built into the protocol; the other is ?reliable? and detects whether data is received > completely and in the correct order. > > > > The last advice I heard for traditional IB was that the overhead of connected mode isn?t worth it, particularly if you?re using IPoIB (where you?re likely to be using TCP anyway). That said, on our OPA network we?re seeing the opposite advice; so I, to, am > often unsure what the most correct configuration would be for any given fabric. > > > > ~jonathon > > > > > > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Damir Krstic" on behalf of damir.krstic at gmail.com> wrote: > > > > I never fully understood the difference between connected v. datagram mode beside the obvious packet size difference. Our NSD servers (ESS GL6 nodes) are installed with RedHat 7 and are in connected mode. Our 700+ clients are running RH6 and > > are in datagram mode. > > > > > > In a month we are upgrading our cluster to RedHat 7 and are debating whether to leave the compute nodes in datagram mode or whether to switch them to connected mode. > > What is is the right thing to do? > > > > > > Thanks in advance. > > Damir > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at > spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at > spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stijn.deweirdt at ugent.be Sun May 14 10:16:12 2017 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Sun, 14 May 2017 11:16:12 +0200 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: hi all, does anyone know about the impact of memory usage? afaik, connected mode keeps buffers for each QP (old-ish mellanox (connectx-2, MLNX ofed2) instructions suggested not to use CM for large-ish (>128 nodes at that time) clusters. we never turned it back on, and now have 700 nodes. wrt IPoIB MTU, UD can have up to 4042 (or something like that) with correct opensm configuration. stijn On 05/13/2017 01:27 AM, Laurence Horrocks-Barlow wrote: > It also depends on the adapter. > > We have seen better performance using datagram with MLNX adapters > however we see better in connected mode when using Intel truescale. > Again as Jonathon has mentioned we have also seen better performance > when using connected mode on active/slave bonded interface (even between > a mixed MLNX/TS fabric). > > There is also a significant difference in the MTU size you can use in > datagram vs connected mode, with datagram being limited to 2044 (if > memory serves) there as connected mode can use 65536 (again if memory > serves). > > I typically now run qperf and nsdperf benchmarks to find the best > configuration. > > -- Lauz > > On 12/05/2017 16:05, Jonathon A Anderson wrote: >> It may be true that you should always favor connected mode; but those >> instructions look like they?re specifically only talking about when >> you have bonded interfaces. >> >> ~jonathon >> >> >> On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of Jan-Frode Myklebust" >> > janfrode at tanso.net> wrote: >> >> I also don't know much about this, but the ESS >> quick deployment guide is quite clear on the we should use connected >> mode for IPoIB: >> -------------- >> Note: If using bonded IP over IB, do the following: Ensure that >> the CONNECTED_MODE=yes statement exists in the corresponding >> slave-bond interface scripts located in /etc/sysconfig/network-scripts >> directory of the management server and I/O server nodes. These >> scripts are created as part of the IP over IB bond creation. An >> example of the slave-bond interface with the modification is shown below. >> --------------- >> -jf >> fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister >> : >> For what it's worth we've seen *significantly* better >> performance of >> streaming benchmarks of IPoIB with connected mode vs datagram >> mode on IB. >> -Aaron >> On 5/12/17 10:43 AM, Jonathon A Anderson wrote: >> > This won?t tell you which to use; but datagram mode and >> connected mode in IB is roughly analogous to UDB vs TCP in IP. One is >> ?unreliable? in that there?s no checking/retry built into the >> protocol; the other is ?reliable? and detects whether data is received >> completely and in the correct order. >> > >> > The last advice I heard for traditional IB was that the >> overhead of connected mode isn?t worth it, particularly if you?re >> using IPoIB (where you?re likely to be using TCP anyway). That said, >> on our OPA network we?re seeing the opposite advice; so I, to, am >> often unsure what the most correct configuration would be for >> any given fabric. >> > >> > ~jonathon >> > >> > >> > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org >> on behalf of Damir Krstic" > on behalf of damir.krstic at gmail.com> wrote: >> > >> > I never fully understood the difference between connected >> v. datagram mode beside the obvious packet size difference. Our NSD >> servers (ESS GL6 nodes) are installed with RedHat 7 and are in >> connected mode. Our 700+ clients are running RH6 and >> > are in datagram mode. >> > >> > >> > In a month we are upgrading our cluster to RedHat 7 and are >> debating whether to leave the compute nodes in datagram mode or >> whether to switch them to connected mode. >> > What is is the right thing to do? >> > >> > >> > Thanks in advance. >> > Damir >> > >> > >> > >> > _______________________________________________ >> > gpfsug-discuss mailing list >> > gpfsug-discuss at >> spectrumscale.org >> > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at >> spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Greg.Lehmann at csiro.au Mon May 15 00:41:13 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Sun, 14 May 2017 23:41:13 +0000 Subject: [gpfsug-discuss] connected v. datagram mode In-Reply-To: References: <8782b996-eb40-6479-c509-9009ac24dbd7@nasa.gov> Message-ID: <82aac761681744b28e7010f22ef7cb81@exch1-cdc.nexus.csiro.au> I asked Mellanox about this nearly 2 years ago and was told around the 500 node mark there will be a tipping point and that datagram will be more useful after that. Memory utilisation was the issue. I've also seen references to smaller node counts more recently as well as generic recommendations to use datagram for any size cluster. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Stijn De Weirdt Sent: Sunday, 14 May 2017 7:16 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] connected v. datagram mode hi all, does anyone know about the impact of memory usage? afaik, connected mode keeps buffers for each QP (old-ish mellanox (connectx-2, MLNX ofed2) instructions suggested not to use CM for large-ish (>128 nodes at that time) clusters. we never turned it back on, and now have 700 nodes. wrt IPoIB MTU, UD can have up to 4042 (or something like that) with correct opensm configuration. stijn On 05/13/2017 01:27 AM, Laurence Horrocks-Barlow wrote: > It also depends on the adapter. > > We have seen better performance using datagram with MLNX adapters > however we see better in connected mode when using Intel truescale. > Again as Jonathon has mentioned we have also seen better performance > when using connected mode on active/slave bonded interface (even > between a mixed MLNX/TS fabric). > > There is also a significant difference in the MTU size you can use in > datagram vs connected mode, with datagram being limited to 2044 (if > memory serves) there as connected mode can use 65536 (again if memory > serves). > > I typically now run qperf and nsdperf benchmarks to find the best > configuration. > > -- Lauz > > On 12/05/2017 16:05, Jonathon A Anderson wrote: >> It may be true that you should always favor connected mode; but those >> instructions look like they?re specifically only talking about when >> you have bonded interfaces. >> >> ~jonathon >> >> >> On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on >> behalf of Jan-Frode Myklebust" >> > janfrode at tanso.net> wrote: >> >> I also don't know much about this, but the ESS >> quick deployment guide is quite clear on the we should use connected >> mode for IPoIB: >> -------------- >> Note: If using bonded IP over IB, do the following: Ensure that >> the CONNECTED_MODE=yes statement exists in the corresponding >> slave-bond interface scripts located in >> /etc/sysconfig/network-scripts directory of the management server and I/O server nodes. These >> scripts are created as part of the IP over IB bond creation. An >> example of the slave-bond interface with the modification is shown below. >> --------------- >> -jf >> fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister >> : >> For what it's worth we've seen *significantly* better >> performance of >> streaming benchmarks of IPoIB with connected mode vs datagram >> mode on IB. >> -Aaron >> On 5/12/17 10:43 AM, Jonathon A Anderson wrote: >> > This won?t tell you which to use; but datagram mode and >> connected mode in IB is roughly analogous to UDB vs TCP in IP. One is >> ?unreliable? in that there?s no checking/retry built into the >> protocol; the other is ?reliable? and detects whether data is received >> completely and in the correct order. >> > >> > The last advice I heard for traditional IB was that the >> overhead of connected mode isn?t worth it, particularly if you?re >> using IPoIB (where you?re likely to be using TCP anyway). That said, >> on our OPA network we?re seeing the opposite advice; so I, to, am >> often unsure what the most correct configuration would be for >> any given fabric. >> > >> > ~jonathon >> > >> > >> > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org >> on behalf of Damir Krstic" > on behalf of damir.krstic at gmail.com> wrote: >> > >> > I never fully understood the difference between connected >> v. datagram mode beside the obvious packet size difference. Our NSD >> servers (ESS GL6 nodes) are installed with RedHat 7 and are in >> connected mode. Our 700+ clients are running RH6 and >> > are in datagram mode. >> > >> > >> > In a month we are upgrading our cluster to RedHat 7 and are >> debating whether to leave the compute nodes in datagram mode or >> whether to switch them to connected mode. >> > What is is the right thing to do? >> > >> > >> > Thanks in advance. >> > Damir >> > >> > >> > >> > _______________________________________________ >> > gpfsug-discuss mailing list >> > gpfsug-discuss at >> spectrumscale.org >> > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at >> spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From varun.mittal at in.ibm.com Mon May 15 19:39:28 2017 From: varun.mittal at in.ibm.com (Varun Mittal3) Date: Tue, 16 May 2017 00:09:28 +0530 Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration In-Reply-To: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> References: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> Message-ID: Hi Christian Is the data on existing cluster being accesses only through SMB or is it shared with NFS users having same UID/GIDs ? What mechanism would you be using to migrate the data ? I mean, if it's pure smb and the data migration would also be over smb share only (using tool like robocopy), you need not have the same IDs on both the source and the target system. Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune From: "Fey, Christian" To: gpfsug main discussion list Date: 11/05/2017 09:07 PM Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The old ctdb cluster uses rid as idmap backend. Since CES officially supports only autorid, I tried to choose the right values for the idmap ranges / sizes to get the same IDs but was not successful with this. The old samba has a range assigned for their Active directory (idmap config XYZ : range = 1000000-1999999) My idea was to set autorid to the following during mmuserauth create: idmap config * : backend = autorid idmap config * : range = 200000-2999999 idmap config * : rangesize = 800000 With that it should use the first range for the builtin range and the second should then start with 1000000 like in the old rid config. Sadly, the range of the domain is the third one: /usr/lpp/mmfs/bin/net idmap get ranges RANGE 0: ALLOC RANGE 1: S-1-5-32 RANGE 2: S-1-5-21-123456789-123456789-123456789 Does anyone have an idea how to fix this, maybe in a supported way and without storing the IDs in the domain? Further on, does anyone use rid as backend, even if not officially supported? Maybe we could file a RPQ or sth. Like this. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 [attachment "smime.p7s" deleted by Varun Mittal3/India/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From p.childs at qmul.ac.uk Tue May 16 10:40:09 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Tue, 16 May 2017 10:40:09 +0100 Subject: [gpfsug-discuss] AFM Prefetch Missing Files Message-ID: I know it was said at the User group meeting last week that older versions of afm prefetch miss empty files and that this is now fixed in 4.2.2.3. We are in the middle of trying to migrate our files to a new filesystem, and since that was said I'm double checking for any mistakes etc. Anyway it looks like AFM prefetch also misses symlinks pointing to files that that don't exist. ie "dangling symlinks" or ones that point to files that either have not been created yet or have subsequently been deleted. or when files have been decompressed and a symlink extracted that points somewhere that is never going to exist. I'm still checking this, and as yet it does not look like its a data loss issue, but it could still cause things to not quiet work once the file migration is complete. Does anyone else know of any other types of files that might be missed and I need to be aware of? We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" using a gpfs policy to collect the list, we are using GPFS Multi-cluster to connect the two filesystems not NFS.... Thanks in advanced Peter Childs From service at metamodul.com Tue May 16 20:17:55 2017 From: service at metamodul.com (Hans-Joachim Ehlers) Date: Tue, 16 May 2017 21:17:55 +0200 (CEST) Subject: [gpfsug-discuss] Maximum network delay for a Quorum Buster node Message-ID: <1486746025.249506.1494962275357@email.1und1.de> An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Wed May 17 12:26:44 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Wed, 17 May 2017 11:26:44 +0000 Subject: [gpfsug-discuss] Introduction Message-ID: Hi All, I help to run a gpfs cluster at the Met Office, Exeter, UK. The cluster is running GPFS 4.2.2.2, it's used with slurm for batch work - primarily for postprocessing weather and climate change model data generated from our HPC. We currently have 8 NSD nodes with approx 3PB of storage with 70+ client nodes. Kind Regards Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Wed May 17 12:44:01 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Wed, 17 May 2017 11:44:01 +0000 Subject: [gpfsug-discuss] GPFS GUI Message-ID: Hello all, Does anyone have any experience with troubleshooting the new GPFS GUI? I've got it up and running but have a few weird problems with it... Maybe someone can help or point me in the right direction? 1. It keeps generating an alert saying that the cluster is down, when it isn't?? Event name: gui_cluster_down Component: GUI Entity type: Node Entity name: Event time: 17/05/2017 12:19:29 Message: The GUI detected that the cluster is down. Description: The GUI checks the cluster state. Cause: The GUI calculated that an insufficient amount of quorum nodes is up and running. User action: Check why the cluster lost quorum. Reporting node: Event type: Active health state of an entity which is monitored by the system. 2. It is collecting sensor data from the NSD nodes without any issue, but it won't collect sensor data from any of the client nodes? I have the pmsensors package installed on all the nodes in question , the service is enabled and running - the logs showing that it has connected to the collector. However in the GUI it just says "Performance collector did not return any data" 3. The NSD nodes are returning performance data, but are all displaying a state of unknown. Would be great if anyone has any experience or ideas on how to troubleshoot this! Thanks Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed May 17 12:58:15 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Wed, 17 May 2017 07:58:15 -0400 Subject: [gpfsug-discuss] GPFS GUI In-Reply-To: References: Message-ID: <0E0FB0DB-BC4B-40F7-AD49-B365EA57EE65@brown.edu> I have issues as well with the gui. The issue that I had most similar to yours came about because I had installed the collector RPM and enabled collectors on two server nodes, but the GUI was only getting data from one of them. Each client randomly selected a collector to deliver data to. So how are multiple collectors supposed to work? Active/Passive? Failover pairs? Shared storage? Better not be on GPFS? Maybe there is a place in the gui config to tell it to keep track of multiple collectors, but I gave up looking and turned of the second collector service and removed it from the candidates. Other issue I mentioned before is that it is totally confused about how many nodes are in the cluster (thinks 21, with 3 unhealthy) when there are only 12 nodes in all, all healthy. The nodes dashboard never finishes loading, and no means of digging deeper (text based info) to find out why it is wedged. ? ddj > On May 17, 2017, at 7:44 AM, Wilson, Neil wrote: > > Hello all, > > Does anyone have any experience with troubleshooting the new GPFS GUI? > I?ve got it up and running but have a few weird problems with it... > Maybe someone can help or point me in the right direction? > > 1. It keeps generating an alert saying that the cluster is down, when it isn?t?? > > Event name: > gui_cluster_down > Component: > GUI > Entity type: > Node > Entity name: > Event time: > 17/05/2017 12:19:29 > Message: > The GUI detected that the cluster is down. > Description: > The GUI checks the cluster state. > Cause: > The GUI calculated that an insufficient amount of quorum nodes is up and running. > User action: > Check why the cluster lost quorum. > Reporting node: > Event type: > Active health state of an entity which is monitored by the system. > > 2. It is collecting sensor data from the NSD nodes without any issue, but it won?t collect sensor data from any of the client nodes? > I have the pmsensors package installed on all the nodes in question , the service is enabled and running ? the logs showing that it has connected to the collector. > However in the GUI it just says ?Performance collector did not return any data? > > 3. The NSD nodes are returning performance data, but are all displaying a state of unknown. > > > Would be great if anyone has any experience or ideas on how to troubleshoot this! > > Thanks > Neil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed May 17 13:23:48 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 17 May 2017 12:23:48 +0000 Subject: [gpfsug-discuss] GPFS GUI Message-ID: I don?t run the GUI in production, so I can?t comment on those issues specifically. I have been running a federated collector cluster for some time and it?s been working as expected. I?ve been using the Zimon-Grafana bridge code to look at GPFS performance stats. The other part of this is the mmhealth/mmsysmonitor process that reports events. It?s been problematic for me, especially in larger clusters (400+ nodes). The mmsysmonitor process is overloading the master node (the cluster manager) with too many ?heartbeats? and ends up causing lots of issues and log messages. Evidently this is something IBM is aware of (at the 4.2.2-2 level) and they have fixes coming out in 4.2.3 PTF1. I ended up disabling the cluster wide collection of health stats to prevent the cluster manager issues. However, be aware that CES depends on the mmhealth data so tinkering with the config make cause other issues if you use CES. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "David D. Johnson" Reply-To: gpfsug main discussion list Date: Wednesday, May 17, 2017 at 6:58 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS GUI So how are multiple collectors supposed to work? Active/Passive? Failover pairs? Shared storage? Better not be on GPFS? Maybe there is a place in the gui config to tell it to keep track of multiple collectors, but I gave up looking and turned of the second collector service and removed it from the candidates. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Wed May 17 17:00:12 2017 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Wed, 17 May 2017 18:00:12 +0200 Subject: [gpfsug-discuss] GPFS GUI In-Reply-To: <0E0FB0DB-BC4B-40F7-AD49-B365EA57EE65@brown.edu> References: <0E0FB0DB-BC4B-40F7-AD49-B365EA57EE65@brown.edu> Message-ID: Hello all, if multiple collectors should work together in a federation, the collector peers need to he specified in the ZimonCollectors.cfg. The GUI will see data from all collectors if federation is set up. See documentation below in the KC (works in 4.2.2 and 4.2.3 alike): https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_federation.htm For the issue related to the nodes count, can you contact me per PN? Mit freundlichen Gr??en / Kind regards Markus Rohwedder IBM Spectrum Scale GUI Development From: "David D. Johnson" To: gpfsug main discussion list Date: 17.05.2017 13:59 Subject: Re: [gpfsug-discuss] GPFS GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org I have issues as well with the gui. The issue that I had most similar to yours came about because I had installed the collector RPM and enabled collectors on two server nodes, but the GUI was only getting data from one of them. Each client randomly selected a collector to deliver data to. So how are multiple collectors supposed to work? Active/Passive? Failover pairs? Shared storage? Better not be on GPFS? Maybe there is a place in the gui config to tell it to keep track of multiple collectors, but I gave up looking and turned of the second collector service and removed it from the candidates. Other issue I mentioned before is that it is totally confused about how many nodes are in the cluster (thinks 21, with 3 unhealthy) when there are only 12 nodes in all, all healthy. The nodes dashboard never finishes loading, and no means of digging deeper (text based info) to find out why it is wedged. ? ddj On May 17, 2017, at 7:44 AM, Wilson, Neil < neil.wilson at metoffice.gov.uk> wrote: Hello all, Does anyone have any experience with troubleshooting the new GPFS GUI? I?ve got it up and running but have a few weird problems with it... Maybe someone can help or point me in the right direction? 1. It keeps generating an alert saying that the cluster is down, when it isn?t?? Event name: gui_cluster_down Component: GUI Entity type: Node Entity name: Event time: 17/05/2017 12:19:29 Message: The GUI detected that the cluster is down. Description: The GUI checks the cluster state. Cause: The GUI calculated that an insufficient amount of quorum nodes is up and running. User action: Check why the cluster lost quorum. Reporting node: Event type: Active health state of an entity which is monitored by the system. 2. It is collecting sensor data from the NSD nodes without any issue, but it won?t collect sensor data from any of the client nodes? I have the pmsensors package installed on all the nodes in question , the service is enabled and running ? the logs showing that it has connected to the collector. However in the GUI it just says ?Performance collector did not return any data? 3. The NSD nodes are returning performance data, but are all displaying a state of unknown. Would be great if anyone has any experience or ideas on how to troubleshoot this! Thanks Neil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From carlz at us.ibm.com Wed May 17 17:11:40 2017 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 17 May 2017 16:11:40 +0000 Subject: [gpfsug-discuss] Brief survey on GPFS / Scale usage from Scale Development Message-ID: An HTML attachment was scrubbed... URL: From Christian.Fey at sva.de Wed May 17 20:09:42 2017 From: Christian.Fey at sva.de (Fey, Christian) Date: Wed, 17 May 2017 19:09:42 +0000 Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration In-Reply-To: References: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> Message-ID: <310fef91208741b0b8059e805077f40e@sva.de> Hi, we have an existing filesystem and want to move from homebrew Samba/CTDB to CES. Since there is a lot of data in it, relabeling / migrating is not an option. FS stays the same, only nodes that share the FS change. There is an option to change the range (delete the existing ranges, set the new ones) with "net idmap set range" but in my Lab setup I was not successful in changing it. --cut-- [root at gpfs4n1 src]# /usr/lpp/mmfs/bin/net idmap set range 0 S-1-5-21-123456789-... Failed to save domain mapping: NT_STATUS_INVALID_PARAMETER --cut-- Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 Mobil: +49 151 180 251 39 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Varun Mittal3 Gesendet: Montag, 15. Mai 2017 20:39 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Hi Christian Is the data on existing cluster being accesses only through SMB or is it shared with NFS users having same UID/GIDs ? What mechanism would you be using to migrate the data ? I mean, if it's pure smb and the data migration would also be over smb share only (using tool like robocopy), you need not have the same IDs on both the source and the target system. Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for "Fey, Christian" ---11/05/2017 09:07:57 PM---Hi all, I'm just doing a migration from a gpfs cluster w]"Fey, Christian" ---11/05/2017 09:07:57 PM---Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The ol From: "Fey, Christian" > To: gpfsug main discussion list > Date: 11/05/2017 09:07 PM Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The old ctdb cluster uses rid as idmap backend. Since CES officially supports only autorid, I tried to choose the right values for the idmap ranges / sizes to get the same IDs but was not successful with this. The old samba has a range assigned for their Active directory (idmap config XYZ : range = 1000000-1999999) My idea was to set autorid to the following during mmuserauth create: idmap config * : backend = autorid idmap config * : range = 200000-2999999 idmap config * : rangesize = 800000 With that it should use the first range for the builtin range and the second should then start with 1000000 like in the old rid config. Sadly, the range of the domain is the third one: /usr/lpp/mmfs/bin/net idmap get ranges RANGE 0: ALLOC RANGE 1: S-1-5-32 RANGE 2: S-1-5-21-123456789-123456789-123456789 Does anyone have an idea how to fix this, maybe in a supported way and without storing the IDs in the domain? Further on, does anyone use rid as backend, even if not officially supported? Maybe we could file a RPQ or sth. Like this. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 [attachment "smime.p7s" deleted by Varun Mittal3/India/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5467 bytes Desc: not available URL: From Christian.Fey at sva.de Wed May 17 20:37:36 2017 From: Christian.Fey at sva.de (Fey, Christian) Date: Wed, 17 May 2017 19:37:36 +0000 Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration References: <614b7dd390f84296bc1ee3a5cd9feb2a@sva.de> Message-ID: <38b79da90bfc4c549c5971f06cfaf5e5@sva.de> I just got the information that there is a debugging switch for the "net" commands (-d10). Looks like the issue with setting the ranges is caused by my lab setup (complains that the ranges are still present). I will try again with a scratched config and report back. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 Mobil: +49 151 180 251 39 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 Von: Fey, Christian Gesendet: Mittwoch, 17. Mai 2017 21:10 An: gpfsug main discussion list Betreff: AW: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Hi, we have an existing filesystem and want to move from homebrew Samba/CTDB to CES. Since there is a lot of data in it, relabeling / migrating is not an option. FS stays the same, only nodes that share the FS change. There is an option to change the range (delete the existing ranges, set the new ones) with "net idmap set range" but in my Lab setup I was not successful in changing it. --cut-- [root at gpfs4n1 src]# /usr/lpp/mmfs/bin/net idmap set range 0 S-1-5-21-123456789-... Failed to save domain mapping: NT_STATUS_INVALID_PARAMETER --cut-- Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 Mobil: +49 151 180 251 39 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Varun Mittal3 Gesendet: Montag, 15. Mai 2017 20:39 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Hi Christian Is the data on existing cluster being accesses only through SMB or is it shared with NFS users having same UID/GIDs ? What mechanism would you be using to migrate the data ? I mean, if it's pure smb and the data migration would also be over smb share only (using tool like robocopy), you need not have the same IDs on both the source and the target system. Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for "Fey, Christian" ---11/05/2017 09:07:57 PM---Hi all, I'm just doing a migration from a gpfs cluster w]"Fey, Christian" ---11/05/2017 09:07:57 PM---Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The ol From: "Fey, Christian" > To: gpfsug main discussion list > Date: 11/05/2017 09:07 PM Subject: [gpfsug-discuss] Samba (rid) -> CES (autorid) migration Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, I'm just doing a migration from a gpfs cluster with samba/ctdb to CES protocol nodes. The old ctdb cluster uses rid as idmap backend. Since CES officially supports only autorid, I tried to choose the right values for the idmap ranges / sizes to get the same IDs but was not successful with this. The old samba has a range assigned for their Active directory (idmap config XYZ : range = 1000000-1999999) My idea was to set autorid to the following during mmuserauth create: idmap config * : backend = autorid idmap config * : range = 200000-2999999 idmap config * : rangesize = 800000 With that it should use the first range for the builtin range and the second should then start with 1000000 like in the old rid config. Sadly, the range of the domain is the third one: /usr/lpp/mmfs/bin/net idmap get ranges RANGE 0: ALLOC RANGE 1: S-1-5-32 RANGE 2: S-1-5-21-123456789-123456789-123456789 Does anyone have an idea how to fix this, maybe in a supported way and without storing the IDs in the domain? Further on, does anyone use rid as backend, even if not officially supported? Maybe we could file a RPQ or sth. Like this. Mit freundlichen Gr??en / Best Regards Christian Fey SVA System Vertrieb Alexander GmbH Borsigstra?e 14 65205 Wiesbaden Tel.: +49 6122 536-0 Fax: +49 6122 536-399 E-Mail: christian.fey at sva.de http://www.sva.de Gesch?ftsf?hrung: Philipp Alexander, Sven Eichelbaum Sitz der Gesellschaft: Wiesbaden Registergericht: Amtsgericht Wiesbaden, HRB 10315 [attachment "smime.p7s" deleted by Varun Mittal3/India/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5467 bytes Desc: not available URL: From pinto at scinet.utoronto.ca Wed May 17 21:44:47 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 17 May 2017 16:44:47 -0400 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors Message-ID: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: * project3 * scratch3 * sysadmin3 I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we have no need or space to include *scratch3* on TSM. Question: how to craft the mmbackup command to backup /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? Below are 3 types of errors: 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up dependent fileset sysadmin3 is not supported Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope filesystem --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. These examples don't really cover my case: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples Thanks Jaime ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From luis.bolinches at fi.ibm.com Wed May 17 21:49:35 2017 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 17 May 2017 23:49:35 +0300 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> References: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> Message-ID: Hi have you tried to add exceptions on the TSM client config file? Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked on /IBM/GPFS/FSET1 dsm.sys ... DOMAIN /IBM/GPFS EXCLUDE.DIR /IBM/GPFS/FSET1 From: "Jaime Pinto" To: "gpfsug main discussion list" Date: 17-05-17 23:44 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: * project3 * scratch3 * sysadmin3 I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we have no need or space to include *scratch3* on TSM. Question: how to craft the mmbackup command to backup /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? Below are 3 types of errors: 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up dependent fileset sysadmin3 is not supported Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope filesystem --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. These examples don't really cover my case: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples Thanks Jaime ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu May 18 00:48:58 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 17 May 2017 19:48:58 -0400 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: References: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> Message-ID: <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Quoting "Luis Bolinches" : > Hi > > have you tried to add exceptions on the TSM client config file? Hey Luis, That would work as well (mechanically), however it's not elegant or efficient. When you have over 1PB and 200M files on scratch it will take many hours and several helper nodes to traverse that fileset just to be negated by TSM. In fact exclusion on TSM are just as inefficient. Considering that I want to keep project and sysadmin on different domains then it's much worst, since we have to traverse and exclude scratch & (project|sysadmin) twice, once to capture sysadmin and again to capture project. If I have to use exclusion rules it has to rely sole on gpfs rules, and somehow not traverse scratch at all. I suspect there is a way to do this properly, however the examples on the gpfs guide and other references are not exhaustive. They only show a couple of trivial cases. However my situation is not unique. I suspect there are may facilities having to deal with backup of HUGE filesets. So the search is on. Thanks Jaime > > Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked > on /IBM/GPFS/FSET1 > > dsm.sys > ... > > DOMAIN /IBM/GPFS > EXCLUDE.DIR /IBM/GPFS/FSET1 > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > Date: 17-05-17 23:44 > Subject: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: > * project3 > * scratch3 > * sysadmin3 > > I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we > have no need or space to include *scratch3* on TSM. > > Question: how to craft the mmbackup command to backup > /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? > > Below are 3 types of errors: > > 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > > ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up > dependent fileset sysadmin3 is not supported > Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > > 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope filesystem --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > These examples don't really cover my case: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > > > Thanks > Jaime > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From pinto at scinet.utoronto.ca Thu May 18 02:43:29 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 17 May 2017 21:43:29 -0400 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> References: <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Message-ID: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 -------------------------------------------------------- mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. -------------------------------------------------------- Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 -------------------------------------------------------- Will post the outcome. Jaime Quoting "Jaime Pinto" : > Quoting "Luis Bolinches" : > >> Hi >> >> have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > >> >> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked >> on /IBM/GPFS/FSET1 >> >> dsm.sys >> ... >> >> DOMAIN /IBM/GPFS >> EXCLUDE.DIR /IBM/GPFS/FSET1 >> >> >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" >> Date: 17-05-17 23:44 >> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >> * project3 >> * scratch3 >> * sysadmin3 >> >> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >> have no need or space to include *scratch3* on TSM. >> >> Question: how to craft the mmbackup command to backup >> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >> >> Below are 3 types of errors: >> >> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --tsm-errorlog $logfile -L 2 >> >> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >> cannot be specified at the same time. >> >> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> >> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >> dependent fileset sysadmin3 is not supported >> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> >> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope filesystem --tsm-errorlog $logfile -L 2 >> >> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >> cannot be specified at the same time. >> >> These examples don't really cover my case: >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >> >> >> Thanks >> Jaime >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From dominic.mueller at de.ibm.com Thu May 18 07:09:31 2017 From: dominic.mueller at de.ibm.com (Dominic Mueller-Wicke01) Date: Thu, 18 May 2017 06:09:31 +0000 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu May 18 07:09:33 2017 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 18 May 2017 06:09:33 +0000 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors In-Reply-To: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Message-ID: An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Thu May 18 10:08:20 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 May 2017 10:08:20 +0100 Subject: [gpfsug-discuss] AFM Prefetch Missing Files In-Reply-To: References: Message-ID: Further investigation and checking says 4.2.1 afmctl prefetch is missing empty directories (not files as said previously) and noted by the update in 4.2.2.3. However I've found it is also missing symlinks both dangling (pointing to files that don't exist) and not. I can't see any actual data loss which is good. I'm looking to work around this with find /data2/$fileset -noleaf \( \( -type d -empty \) -o \( -type l \) \) -printf "%p -> %l\n" My initial testing says this should work. (/data2/$fileset is the destination "cache" fileset) It looks like this should catch everything, But I'm wondering if anyone else has noticed any other things afmctl prefetch misses. Thanks in advance Peter Childs On 16/05/17 10:40, Peter Childs wrote: > I know it was said at the User group meeting last week that older > versions of afm prefetch miss empty files and that this is now fixed > in 4.2.2.3. > > We are in the middle of trying to migrate our files to a new > filesystem, and since that was said I'm double checking for any > mistakes etc. > > Anyway it looks like AFM prefetch also misses symlinks pointing to > files that that don't exist. ie "dangling symlinks" or ones that point > to files that either have not been created yet or have subsequently > been deleted. or when files have been decompressed and a symlink > extracted that points somewhere that is never going to exist. > > I'm still checking this, and as yet it does not look like its a data > loss issue, but it could still cause things to not quiet work once the > file migration is complete. > > Does anyone else know of any other types of files that might be missed > and I need to be aware of? > > We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" > using a gpfs policy to collect the list, we are using GPFS > Multi-cluster to connect the two filesystems not NFS.... > > Thanks in advanced > > > Peter Childs > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From neil.wilson at metoffice.gov.uk Thu May 18 10:24:53 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Thu, 18 May 2017 09:24:53 +0000 Subject: [gpfsug-discuss] AFM Prefetch Missing Files In-Reply-To: References: Message-ID: We recently migrated several hundred TB from an Isilon cluster to our GPFS cluster using AFM using NFS gateways mostly using 4.2.2.2 , the main thing we noticed was that it would not migrate empty directories - we worked around the issue by getting a list of the missing directories and running it through a simple script that cd's into each directory then lists the empty directory. I didn't come across any issues with symlinks not being prefetched, just the directories. Regards Neil Wilson -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Peter Childs Sent: 18 May 2017 10:08 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] AFM Prefetch Missing Files Further investigation and checking says 4.2.1 afmctl prefetch is missing empty directories (not files as said previously) and noted by the update in 4.2.2.3. However I've found it is also missing symlinks both dangling (pointing to files that don't exist) and not. I can't see any actual data loss which is good. I'm looking to work around this with find /data2/$fileset -noleaf \( \( -type d -empty \) -o \( -type l \) \) -printf "%p -> %l\n" My initial testing says this should work. (/data2/$fileset is the destination "cache" fileset) It looks like this should catch everything, But I'm wondering if anyone else has noticed any other things afmctl prefetch misses. Thanks in advance Peter Childs On 16/05/17 10:40, Peter Childs wrote: > I know it was said at the User group meeting last week that older > versions of afm prefetch miss empty files and that this is now fixed > in 4.2.2.3. > > We are in the middle of trying to migrate our files to a new > filesystem, and since that was said I'm double checking for any > mistakes etc. > > Anyway it looks like AFM prefetch also misses symlinks pointing to > files that that don't exist. ie "dangling symlinks" or ones that point > to files that either have not been created yet or have subsequently > been deleted. or when files have been decompressed and a symlink > extracted that points somewhere that is never going to exist. > > I'm still checking this, and as yet it does not look like its a data > loss issue, but it could still cause things to not quiet work once the > file migration is complete. > > Does anyone else know of any other types of files that might be missed > and I need to be aware of? > > We are using 4.2.1-3 and prefetch was done using "mmafmctl prefetch" > using a gpfs policy to collect the list, we are using GPFS > Multi-cluster to connect the two filesystems not NFS.... > > Thanks in advanced > > > Peter Childs > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Thu May 18 14:33:29 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 May 2017 09:33:29 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Message-ID: When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset.... If you didn't say otherwise, inodes come from the default "root" fileset.... Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned.... BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers" -- (very technical hint) you may need to permute the bit order before you discern a simple pattern... From: "Luis Bolinches" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 05/18/2017 02:10 AM Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi There is no direct way to convert the one fileset that is dependent to independent or viceversa. I would suggest to take a look to chapter 5 of the 2014 redbook, lots of definitions about GPFS ILM including filesets http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only place that is explained but I honestly believe is a good single start point. It also needs an update as does nto have anything on CES nor ESS, so anyone in this list feel free to give feedback on that page people with funding decisions listen there. So you are limited to either migrate the data from that fileset to a new independent fileset (multiple ways to do that) or use the TSM client config. ----- Original message ----- From: "Jaime Pinto" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug main discussion list" , "Jaime Pinto" Cc: Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Date: Thu, May 18, 2017 4:43 AM There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 -------------------------------------------------------- mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. -------------------------------------------------------- Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 -------------------------------------------------------- Will post the outcome. Jaime Quoting "Jaime Pinto" : > Quoting "Luis Bolinches" : > >> Hi >> >> have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > >> >> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked >> on /IBM/GPFS/FSET1 >> >> dsm.sys >> ... >> >> DOMAIN /IBM/GPFS >> EXCLUDE.DIR /IBM/GPFS/FSET1 >> >> >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" >> Date: 17-05-17 23:44 >> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >> * project3 >> * scratch3 >> * sysadmin3 >> >> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >> have no need or space to include *scratch3* on TSM. >> >> Question: how to craft the mmbackup command to backup >> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >> >> Below are 3 types of errors: >> >> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --tsm-errorlog $logfile -L 2 >> >> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >> cannot be specified at the same time. >> >> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> >> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >> dependent fileset sysadmin3 is not supported >> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> >> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope filesystem --tsm-errorlog $logfile -L 2 >> >> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >> cannot be specified at the same time. >> >> These examples don't really cover my case: >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >> >> >> Thanks >> Jaime >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu May 18 14:58:51 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 18 May 2017 09:58:51 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> Message-ID: <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that are > in a separable range of inode numbers - this allows GPFS to efficiently do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset.... If you didn't say otherwise, inodes come from the > default "root" fileset.... > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned.... > > BTW - if you look at the bits of the inode numbers carefully --- you may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor ESS, > so anyone in this list feel free to give feedback on that page people with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a new > independent fileset (multiple ways to do that) or use the TSM client > config. > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Jaime Pinto" > Cc: > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Date: Thu, May 18, 2017 4:43 AM > > There is hope. See reference link below: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > > > The issue has to do with dependent vs. independent filesets, something > I didn't even realize existed until now. Our filesets are dependent > (for no particular reason), so I have to find a way to turn them into > independent. > > The proper option syntax is "--scope inodespace", and the error > message actually flagged that out, however I didn't know how to > interpret what I saw: > > > # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > -------------------------------------------------------- > mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 > 21:27:43 EDT 2017. > -------------------------------------------------------- > Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* > fileset sysadmin3 is not supported > Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > -------------------------------------------------------- > > Will post the outcome. > Jaime > > > > Quoting "Jaime Pinto" : > >> Quoting "Luis Bolinches" : >> >>> Hi >>> >>> have you tried to add exceptions on the TSM client config file? >> >> Hey Luis, >> >> That would work as well (mechanically), however it's not elegant or >> efficient. When you have over 1PB and 200M files on scratch it will >> take many hours and several helper nodes to traverse that fileset just >> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >> Considering that I want to keep project and sysadmin on different >> domains then it's much worst, since we have to traverse and exclude >> scratch & (project|sysadmin) twice, once to capture sysadmin and again >> to capture project. >> >> If I have to use exclusion rules it has to rely sole on gpfs rules, and >> somehow not traverse scratch at all. >> >> I suspect there is a way to do this properly, however the examples on >> the gpfs guide and other references are not exhaustive. They only show >> a couple of trivial cases. >> >> However my situation is not unique. I suspect there are may facilities >> having to deal with backup of HUGE filesets. >> >> So the search is on. >> >> Thanks >> Jaime >> >> >> >> >>> >>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is > linked >>> on /IBM/GPFS/FSET1 >>> >>> dsm.sys >>> ... >>> >>> DOMAIN /IBM/GPFS >>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>> >>> >>> From: "Jaime Pinto" >>> To: "gpfsug main discussion list" > >>> Date: 17-05-17 23:44 >>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>> * project3 >>> * scratch3 >>> * sysadmin3 >>> >>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>> have no need or space to include *scratch3* on TSM. >>> >>> Question: how to craft the mmbackup command to backup >>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>> >>> Below are 3 types of errors: >>> >>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --tsm-errorlog $logfile -L 2 >>> >>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>> cannot be specified at the same time. >>> >>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope inodespace --tsm-errorlog $logfile -L 2 >>> >>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>> dependent fileset sysadmin3 is not supported >>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>> fileset level backup. exit 1 >>> >>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope filesystem --tsm-errorlog $logfile -L 2 >>> >>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>> cannot be specified at the same time. >>> >>> These examples don't really cover my case: >>> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >>> >>> >>> Thanks >>> Jaime >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>> Oy IBM Finland Ab >>> PL 265, 00101 Helsinki, Finland >>> Business ID, Y-tunnus: 0195876-3 >>> Registered in Finland >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of > Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From p.childs at qmul.ac.uk Thu May 18 15:12:05 2017 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 May 2017 15:12:05 +0100 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> Message-ID: <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> As I understand it, mmbackup calls mmapplypolicy so this stands for mmapplypolicy too..... mmapplypolicy scans the metadata inodes (file) as requested depending on the query supplied. You can ask mmapplypolicy to scan a fileset, inode space or filesystem. If scanning a fileset it scans the inode space that fileset is dependant on, for all files in that fileset. Smaller inode spaces hence less to scan, hence its faster to use an independent filesets, you get a list of what to process quicker. Another advantage is that once an inode is allocated you can't deallocate it, however you can delete independent filesets and hence deallocate the inodes, so if you have a task which has losts and lots of small files which are only needed for a short period of time, you can create a new independent fileset for them work on them and then blow them away afterwards. I like independent filesets I'm guessing the only reason dependant filesets are used by default is history..... Peter On 18/05/17 14:58, Jaime Pinto wrote: > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > >> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always >> think >> and try to read that as "inode space". >> >> An "independent fileset" has all the attributes of an (older-fashioned) >> dependent fileset PLUS all of its files are represented by inodes >> that are >> in a separable range of inode numbers - this allows GPFS to >> efficiently do >> snapshots of just that inode-space (uh... independent fileset)... >> >> And... of course the files of dependent filesets must also be >> represented >> by inodes -- those inode numbers are within the inode-space of whatever >> the containing independent fileset is... as was chosen when you created >> the fileset.... If you didn't say otherwise, inodes come from the >> default "root" fileset.... >> >> Clear as your bath-water, no? >> >> So why does mmbackup care one way or another ??? Stay tuned.... >> >> BTW - if you look at the bits of the inode numbers carefully --- you may >> not immediately discern what I mean by a "separable range of inode >> numbers" -- (very technical hint) you may need to permute the bit order >> before you discern a simple pattern... >> >> >> >> From: "Luis Bolinches" >> To: gpfsug-discuss at spectrumscale.org >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 05/18/2017 02:10 AM >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope >> errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi >> >> There is no direct way to convert the one fileset that is dependent to >> independent or viceversa. >> >> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >> definitions about GPFS ILM including filesets >> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >> place that is explained but I honestly believe is a good single start >> point. It also needs an update as does nto have anything on CES nor ESS, >> so anyone in this list feel free to give feedback on that page people >> with >> funding decisions listen there. >> >> So you are limited to either migrate the data from that fileset to a new >> independent fileset (multiple ways to do that) or use the TSM client >> config. >> >> ----- Original message ----- >> From: "Jaime Pinto" >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: "gpfsug main discussion list" , >> "Jaime Pinto" >> Cc: >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >> Date: Thu, May 18, 2017 4:43 AM >> >> There is hope. See reference link below: >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm >> >> >> >> The issue has to do with dependent vs. independent filesets, something >> I didn't even realize existed until now. Our filesets are dependent >> (for no particular reason), so I have to find a way to turn them into >> independent. >> >> The proper option syntax is "--scope inodespace", and the error >> message actually flagged that out, however I didn't know how to >> interpret what I saw: >> >> >> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> -------------------------------------------------------- >> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >> 21:27:43 EDT 2017. >> -------------------------------------------------------- >> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >> fileset sysadmin3 is not supported >> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> -------------------------------------------------------- >> >> Will post the outcome. >> Jaime >> >> >> >> Quoting "Jaime Pinto" : >> >>> Quoting "Luis Bolinches" : >>> >>>> Hi >>>> >>>> have you tried to add exceptions on the TSM client config file? >>> >>> Hey Luis, >>> >>> That would work as well (mechanically), however it's not elegant or >>> efficient. When you have over 1PB and 200M files on scratch it will >>> take many hours and several helper nodes to traverse that fileset just >>> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >>> Considering that I want to keep project and sysadmin on different >>> domains then it's much worst, since we have to traverse and exclude >>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>> to capture project. >>> >>> If I have to use exclusion rules it has to rely sole on gpfs rules, and >>> somehow not traverse scratch at all. >>> >>> I suspect there is a way to do this properly, however the examples on >>> the gpfs guide and other references are not exhaustive. They only show >>> a couple of trivial cases. >>> >>> However my situation is not unique. I suspect there are may facilities >>> having to deal with backup of HUGE filesets. >>> >>> So the search is on. >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>>> >>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >> linked >>>> on /IBM/GPFS/FSET1 >>>> >>>> dsm.sys >>>> ... >>>> >>>> DOMAIN /IBM/GPFS >>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>> >>>> >>>> From: "Jaime Pinto" >>>> To: "gpfsug main discussion list" >> >>>> Date: 17-05-17 23:44 >>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>> * project3 >>>> * scratch3 >>>> * sysadmin3 >>>> >>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>> have no need or space to include *scratch3* on TSM. >>>> >>>> Question: how to craft the mmbackup command to backup >>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>> >>>> Below are 3 types of errors: >>>> >>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>> dependent fileset sysadmin3 is not supported >>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>> fileset level backup. exit 1 >>>> >>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> These examples don't really cover my case: >>>> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >> >> >>>> >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>>> >>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>> Oy IBM Finland Ab >>>> PL 265, 00101 Helsinki, Finland >>>> Business ID, Y-tunnus: 0195876-3 >>>> Registered in Finland >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From a.g.richmond at leeds.ac.uk Thu May 18 15:22:55 2017 From: a.g.richmond at leeds.ac.uk (Aidan Richmond) Date: Thu, 18 May 2017 15:22:55 +0100 Subject: [gpfsug-discuss] Keytab error trying to join an active directory domain Message-ID: Hello I'm trying to join an AD domain for SMB and NFS protocol sharing but I keep getting a "Failed to generate the kerberos keytab file" error. The command I'm running is /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type ad --netbios-name @name@ --servers @adserver@ --user-name @username@ --idmap-role master --enable-nfs-kerberos --unixmap-domains "DS(1000-9999999)" A correct keytab does appears to be created on the host I run this on (one of two protocol nodes) but not on the other one. -- Aidan Richmond Apple/Unix Support Officer, IT Garstang 10.137 Faculty of Biological Sciences University of Leeds Clarendon Way LS2 9JT Tel:0113 3434252 a.g.richmond at leeds.ac.uk From makaplan at us.ibm.com Thu May 18 15:23:30 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 May 2017 10:23:30 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> Message-ID: Jaime, While we're waiting for the mmbackup expert to weigh in, notice that the mmbackup command does have a -P option that allows you to provide a customized policy rules file. So... a fairly safe hack is to do a trial mmbackup run, capture the automatically generated policy file, and then augment it with FOR FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup for real with your customized policy file. mmbackup uses mmapplypolicy which by itself is happy to limit its directory scan to a particular fileset by using mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope fileset .... However, mmbackup probably has other worries and for simpliciity and helping make sure you get complete, sensible backups, apparently has imposed some restrictions to preserve sanity (yours and our support team! ;-) ) ... (For example, suppose you were doing incremental backups, starting at different paths each time? -- happy to do so, but when disaster strikes and you want to restore -- you'll end up confused and/or unhappy!) "converting from one fileset to another" --- sorry there is no such thing. Filesets are kinda like little filesystems within filesystems. Moving a file from one fileset to another requires a copy operation. There is no fast move nor hardlinking. --marc From: "Jaime Pinto" To: "gpfsug main discussion list" , "Marc A Kaplan" Date: 05/18/2017 09:58 AM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that are > in a separable range of inode numbers - this allows GPFS to efficiently do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset.... If you didn't say otherwise, inodes come from the > default "root" fileset.... > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned.... > > BTW - if you look at the bits of the inode numbers carefully --- you may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor ESS, > so anyone in this list feel free to give feedback on that page people with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a new > independent fileset (multiple ways to do that) or use the TSM client > config. > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Jaime Pinto" > Cc: > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Date: Thu, May 18, 2017 4:43 AM > > There is hope. See reference link below: > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > > > The issue has to do with dependent vs. independent filesets, something > I didn't even realize existed until now. Our filesets are dependent > (for no particular reason), so I have to find a way to turn them into > independent. > > The proper option syntax is "--scope inodespace", and the error > message actually flagged that out, however I didn't know how to > interpret what I saw: > > > # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > -------------------------------------------------------- > mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 > 21:27:43 EDT 2017. > -------------------------------------------------------- > Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* > fileset sysadmin3 is not supported > Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > -------------------------------------------------------- > > Will post the outcome. > Jaime > > > > Quoting "Jaime Pinto" : > >> Quoting "Luis Bolinches" : >> >>> Hi >>> >>> have you tried to add exceptions on the TSM client config file? >> >> Hey Luis, >> >> That would work as well (mechanically), however it's not elegant or >> efficient. When you have over 1PB and 200M files on scratch it will >> take many hours and several helper nodes to traverse that fileset just >> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >> Considering that I want to keep project and sysadmin on different >> domains then it's much worst, since we have to traverse and exclude >> scratch & (project|sysadmin) twice, once to capture sysadmin and again >> to capture project. >> >> If I have to use exclusion rules it has to rely sole on gpfs rules, and >> somehow not traverse scratch at all. >> >> I suspect there is a way to do this properly, however the examples on >> the gpfs guide and other references are not exhaustive. They only show >> a couple of trivial cases. >> >> However my situation is not unique. I suspect there are may facilities >> having to deal with backup of HUGE filesets. >> >> So the search is on. >> >> Thanks >> Jaime >> >> >> >> >>> >>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is > linked >>> on /IBM/GPFS/FSET1 >>> >>> dsm.sys >>> ... >>> >>> DOMAIN /IBM/GPFS >>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>> >>> >>> From: "Jaime Pinto" >>> To: "gpfsug main discussion list" > >>> Date: 17-05-17 23:44 >>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>> * project3 >>> * scratch3 >>> * sysadmin3 >>> >>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>> have no need or space to include *scratch3* on TSM. >>> >>> Question: how to craft the mmbackup command to backup >>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>> >>> Below are 3 types of errors: >>> >>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --tsm-errorlog $logfile -L 2 >>> >>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>> cannot be specified at the same time. >>> >>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope inodespace --tsm-errorlog $logfile -L 2 >>> >>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>> dependent fileset sysadmin3 is not supported >>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>> fileset level backup. exit 1 >>> >>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope filesystem --tsm-errorlog $logfile -L 2 >>> >>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>> cannot be specified at the same time. >>> >>> These examples don't really cover my case: >>> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >>> >>> >>> Thanks >>> Jaime >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>> Oy IBM Finland Ab >>> PL 265, 00101 Helsinki, Finland >>> Business ID, Y-tunnus: 0195876-3 >>> Registered in Finland >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of > Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu May 18 15:24:17 2017 From: david_johnson at brown.edu (David D. Johnson) Date: Thu, 18 May 2017 10:24:17 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> Message-ID: <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> Here is one big reason independent filesets are problematic: A5.13: Table 43. Maximum number of filesets Version of GPFS Maximum Number of Dependent Filesets Maximum Number of Independent Filesets IBM Spectrum Scale V4 10,000 1,000 GPFS V3.5 10,000 1,000 Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. This is true of the root namespace as well, but there?s only one number to watch per filesystem. ? ddj Dave Johnson Brown University > On May 18, 2017, at 10:12 AM, Peter Childs wrote: > > As I understand it, > > mmbackup calls mmapplypolicy so this stands for mmapplypolicy too..... > > mmapplypolicy scans the metadata inodes (file) as requested depending on the query supplied. > > You can ask mmapplypolicy to scan a fileset, inode space or filesystem. > > If scanning a fileset it scans the inode space that fileset is dependant on, for all files in that fileset. Smaller inode spaces hence less to scan, hence its faster to use an independent filesets, you get a list of what to process quicker. > > Another advantage is that once an inode is allocated you can't deallocate it, however you can delete independent filesets and hence deallocate the inodes, so if you have a task which has losts and lots of small files which are only needed for a short period of time, you can create a new independent fileset for them work on them and then blow them away afterwards. > > I like independent filesets I'm guessing the only reason dependant filesets are used by default is history..... > > > Peter > > > On 18/05/17 14:58, Jaime Pinto wrote: >> Thanks for the explanation Mark and Luis, >> >> It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. >> >> When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. >> >> I'm looking forward to your explanation as to why mmbackup cares one way or another. >> >> I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? >> >> Thanks >> Jaime >> >> Quoting "Marc A Kaplan" : >> >>> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think >>> and try to read that as "inode space". >>> >>> An "independent fileset" has all the attributes of an (older-fashioned) >>> dependent fileset PLUS all of its files are represented by inodes that are >>> in a separable range of inode numbers - this allows GPFS to efficiently do >>> snapshots of just that inode-space (uh... independent fileset)... >>> >>> And... of course the files of dependent filesets must also be represented >>> by inodes -- those inode numbers are within the inode-space of whatever >>> the containing independent fileset is... as was chosen when you created >>> the fileset.... If you didn't say otherwise, inodes come from the >>> default "root" fileset.... >>> >>> Clear as your bath-water, no? >>> >>> So why does mmbackup care one way or another ??? Stay tuned.... >>> >>> BTW - if you look at the bits of the inode numbers carefully --- you may >>> not immediately discern what I mean by a "separable range of inode >>> numbers" -- (very technical hint) you may need to permute the bit order >>> before you discern a simple pattern... >>> >>> >>> >>> From: "Luis Bolinches" >>> To: gpfsug-discuss at spectrumscale.org >>> Cc: gpfsug-discuss at spectrumscale.org >>> Date: 05/18/2017 02:10 AM >>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> Hi >>> >>> There is no direct way to convert the one fileset that is dependent to >>> independent or viceversa. >>> >>> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >>> definitions about GPFS ILM including filesets >>> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >>> place that is explained but I honestly believe is a good single start >>> point. It also needs an update as does nto have anything on CES nor ESS, >>> so anyone in this list feel free to give feedback on that page people with >>> funding decisions listen there. >>> >>> So you are limited to either migrate the data from that fileset to a new >>> independent fileset (multiple ways to do that) or use the TSM client >>> config. >>> >>> ----- Original message ----- >>> From: "Jaime Pinto" >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> To: "gpfsug main discussion list" , >>> "Jaime Pinto" >>> Cc: >>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Date: Thu, May 18, 2017 4:43 AM >>> >>> There is hope. See reference link below: >>> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm >>> >>> >>> The issue has to do with dependent vs. independent filesets, something >>> I didn't even realize existed until now. Our filesets are dependent >>> (for no particular reason), so I have to find a way to turn them into >>> independent. >>> >>> The proper option syntax is "--scope inodespace", and the error >>> message actually flagged that out, however I didn't know how to >>> interpret what I saw: >>> >>> >>> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope inodespace --tsm-errorlog $logfile -L 2 >>> -------------------------------------------------------- >>> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >>> 21:27:43 EDT 2017. >>> -------------------------------------------------------- >>> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >>> fileset sysadmin3 is not supported >>> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >>> fileset level backup. exit 1 >>> -------------------------------------------------------- >>> >>> Will post the outcome. >>> Jaime >>> >>> >>> >>> Quoting "Jaime Pinto" : >>> >>>> Quoting "Luis Bolinches" : >>>> >>>>> Hi >>>>> >>>>> have you tried to add exceptions on the TSM client config file? >>>> >>>> Hey Luis, >>>> >>>> That would work as well (mechanically), however it's not elegant or >>>> efficient. When you have over 1PB and 200M files on scratch it will >>>> take many hours and several helper nodes to traverse that fileset just >>>> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >>>> Considering that I want to keep project and sysadmin on different >>>> domains then it's much worst, since we have to traverse and exclude >>>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>>> to capture project. >>>> >>>> If I have to use exclusion rules it has to rely sole on gpfs rules, and >>>> somehow not traverse scratch at all. >>>> >>>> I suspect there is a way to do this properly, however the examples on >>>> the gpfs guide and other references are not exhaustive. They only show >>>> a couple of trivial cases. >>>> >>>> However my situation is not unique. I suspect there are may facilities >>>> having to deal with backup of HUGE filesets. >>>> >>>> So the search is on. >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> >>>> >>>>> >>>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >>> linked >>>>> on /IBM/GPFS/FSET1 >>>>> >>>>> dsm.sys >>>>> ... >>>>> >>>>> DOMAIN /IBM/GPFS >>>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>>> >>>>> >>>>> From: "Jaime Pinto" >>>>> To: "gpfsug main discussion list" >>> >>>>> Date: 17-05-17 23:44 >>>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>>> >>>>> >>>>> >>>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>>> * project3 >>>>> * scratch3 >>>>> * sysadmin3 >>>>> >>>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>>> have no need or space to include *scratch3* on TSM. >>>>> >>>>> Question: how to craft the mmbackup command to backup >>>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>>> >>>>> Below are 3 types of errors: >>>>> >>>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>>> cannot be specified at the same time. >>>>> >>>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>>> dependent fileset sysadmin3 is not supported >>>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>>> fileset level backup. exit 1 >>>>> >>>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>>> cannot be specified at the same time. >>>>> >>>>> These examples don't really cover my case: >>>>> >>> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >>> >>>>> >>>>> >>>>> Thanks >>>>> Jaime >>>>> >>>>> >>>>> ************************************ >>>>> TELL US ABOUT YOUR SUCCESS STORIES >>>>> http://www.scinethpc.ca/testimonials >>>>> ************************************ >>>>> --- >>>>> Jaime Pinto >>>>> SciNet HPC Consortium - Compute/Calcul Canada >>>>> www.scinet.utoronto.ca - www.computecanada.ca >>>>> University of Toronto >>>>> 661 University Ave. (MaRS), Suite 1140 >>>>> Toronto, ON, M5G1M1 >>>>> P: 416-978-2755 >>>>> C: 416-505-1477 >>>>> >>>>> ---------------------------------------------------------------- >>>>> This message was sent using IMP at SciNet Consortium, University of >>>>> Toronto. >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>>> Oy IBM Finland Ab >>>>> PL 265, 00101 Helsinki, Finland >>>>> Business ID, Y-tunnus: 0195876-3 >>>>> Registered in Finland >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>> Oy IBM Finland Ab >>> PL 265, 00101 Helsinki, Finland >>> Business ID, Y-tunnus: 0195876-3 >>> Registered in Finland >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 18 15:32:42 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 18 May 2017 14:32:42 +0000 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> Message-ID: Thanks, I was just about to post that, and I guess is still the reason a dependent fileset is still the default without the ?inode-space new option fileset creation. I do wonder why there is a limit of 1000, whether it?s just IBM not envisaging any customer needing more than that? We?ve only got 414 at the moment but that will grow to over 500 this year. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of David D. Johnson Sent: 18 May 2017 15:24 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Here is one big reason independent filesets are problematic: A5.13: Table 43. Maximum number of filesets Version of GPFS Maximum Number of Dependent Filesets Maximum Number of Independent Filesets IBM Spectrum Scale V4 10,000 1,000 GPFS V3.5 10,000 1,000 Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. This is true of the root namespace as well, but there?s only one number to watch per filesystem. ? ddj Dave Johnson Brown University -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 18 15:36:33 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 18 May 2017 14:36:33 +0000 Subject: [gpfsug-discuss] Keytab error trying to join an active directory domain In-Reply-To: References: Message-ID: It's crappy, I had to put the command in 10+ times before it would work. Just keep at it (that's my takeaway, sorry I'm not that technical when it comes to Kerberos). Could be a domain replication thing. Is time syncing properly across all your CES nodes? Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Aidan Richmond Sent: 18 May 2017 15:23 To: gpfsug main discussion list Subject: [gpfsug-discuss] Keytab error trying to join an active directory domain Hello I'm trying to join an AD domain for SMB and NFS protocol sharing but I keep getting a "Failed to generate the kerberos keytab file" error. The command I'm running is /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type ad --netbios-name @name@ --servers @adserver@ --user-name @username@ --idmap-role master --enable-nfs-kerberos --unixmap-domains "DS(1000-9999999)" A correct keytab does appears to be created on the host I run this on (one of two protocol nodes) but not on the other one. -- Aidan Richmond Apple/Unix Support Officer, IT Garstang 10.137 Faculty of Biological Sciences University of Leeds Clarendon Way LS2 9JT Tel:0113 3434252 a.g.richmond at leeds.ac.uk _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ulmer at ulmer.org Thu May 18 15:47:59 2017 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 18 May 2017 10:47:59 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> Message-ID: <578DD835-3829-431D-9E89-A3DFDCDA8F39@ulmer.org> Each independent fileset is an allocation area, and they are (I believe) handled separately. There are a set of allocation managers for each file system, and when you need to create a file you ask one of them to do it. Each one has a pre-negotiated range of inodes to hand out, so there isn?t a single point of contention for creating files. I?m pretty sure that means that they all have to have a range for each inode space. This is based on my own logic, and could be complete nonsense. While I?m sure that limit could be changed eventually, there?s probably some efficiencies in not making it much bigger than it needs to be. I don?t know if it would take an on-disk format change or not. So how do you decide that a use case gets it?s own fileset, and do you just always use independent or is there an evaluation? I?m just curious because I like to understand lots of different points of view ? feel free to tell me to go away. :) -- Stephen > On May 18, 2017, at 10:32 AM, Sobey, Richard A > wrote: > > Thanks, I was just about to post that, and I guess is still the reason a dependent fileset is still the default without the ?inode-space new option fileset creation. > > I do wonder why there is a limit of 1000, whether it?s just IBM not envisaging any customer needing more than that? We?ve only got 414 at the moment but that will grow to over 500 this year. > > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org ]On Behalf Of David D. Johnson > Sent: 18 May 2017 15:24 > To: gpfsug main discussion list > > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors > > Here is one big reason independent filesets are problematic: > A5.13: > Table 43. Maximum number of filesets > Version of GPFS > Maximum Number of Dependent Filesets > Maximum Number of Independent Filesets > IBM Spectrum Scale V4 > 10,000 > 1,000 > GPFS V3.5 > 10,000 > 1,000 > Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. > If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. > This is true of the root namespace as well, but there?s only one number to watch per filesystem. > > ? ddj > Dave Johnson > Brown University > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu May 18 15:58:20 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 18 May 2017 14:58:20 +0000 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <578DD835-3829-431D-9E89-A3DFDCDA8F39@ulmer.org> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <209c84ca-39a9-947b-6084-3addc24e94a8@qmul.ac.uk> <54C5E287-DD8A-44A9-B922-9A954CD08FBE@brown.edu> <578DD835-3829-431D-9E89-A3DFDCDA8F39@ulmer.org> Message-ID: So it could be that we didn?t really know what we were doing when our system was installed (and still don?t by some of the messages I post *cough*) but basically I think we?re quite similar to other shops where we resell GPFS to departmental users internally and it just made some sense to break down each one into a fileset. We can then snapshot each one individually (7402 snapshots at the moment) and apply quotas. I know your question was why independent and not dependent ? but I honestly don?t know. I assume it?s to do with not crossing the streams if you?ll excuse the obvious film reference. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Stephen Ulmer Sent: 18 May 2017 15:48 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Each independent fileset is an allocation area, and they are (I believe) handled separately. There are a set of allocation managers for each file system, and when you need to create a file you ask one of them to do it. Each one has a pre-negotiated range of inodes to hand out, so there isn?t a single point of contention for creating files. I?m pretty sure that means that they all have to have a range for each inode space. This is based on my own logic, and could be complete nonsense. While I?m sure that limit could be changed eventually, there?s probably some efficiencies in not making it much bigger than it needs to be. I don?t know if it would take an on-disk format change or not. So how do you decide that a use case gets it?s own fileset, and do you just always use independent or is there an evaluation? I?m just curious because I like to understand lots of different points of view ? feel free to tell me to go away. :) -- Stephen On May 18, 2017, at 10:32 AM, Sobey, Richard A > wrote: Thanks, I was just about to post that, and I guess is still the reason a dependent fileset is still the default without the ?inode-space new option fileset creation. I do wonder why there is a limit of 1000, whether it?s just IBM not envisaging any customer needing more than that? We?ve only got 414 at the moment but that will grow to over 500 this year. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org]On Behalf Of David D. Johnson Sent: 18 May 2017 15:24 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Here is one big reason independent filesets are problematic: A5.13: Table 43. Maximum number of filesets Version of GPFS Maximum Number of Dependent Filesets Maximum Number of Independent Filesets IBM Spectrum Scale V4 10,000 1,000 GPFS V3.5 10,000 1,000 Another is that each independent fileset must be sized (and resized) for the number of inodes it is expected to contain. If that runs out (due to growth or a runaway user job), new files cannot be created until the inode limit is bumped up. This is true of the root namespace as well, but there?s only one number to watch per filesystem. ? ddj Dave Johnson Brown University _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Thu May 18 16:15:30 2017 From: chair at spectrumscale.org (Spectrum Scale UG Chair (Simon Thompson)) Date: Thu, 18 May 2017 16:15:30 +0100 Subject: [gpfsug-discuss] Save the date SSUG 2018 - April 18th/19th 2018 Message-ID: Hi All, A date for your diary, #SSUG18 in the UK will be taking place on: April 18th, 19th 2018 Please mark it in your diaries now :-) We'll confirm other details etc nearer the time, but date is confirmed. Simon From john.hearns at asml.com Thu May 18 16:23:29 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 18 May 2017 15:23:29 +0000 Subject: [gpfsug-discuss] Introduction Message-ID: Good afternoon all, my name is John Hearns. I am currently working with the HPC Team at ASML in the Netherlands, the market sector is manufacturing. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu May 18 17:36:46 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 18 May 2017 12:36:46 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> Message-ID: <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset .... > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such thing. > Filesets are kinda like little filesystems within filesystems. Moving a > file from one fileset to another requires a copy operation. There is no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > >> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always > think >> and try to read that as "inode space". >> >> An "independent fileset" has all the attributes of an (older-fashioned) >> dependent fileset PLUS all of its files are represented by inodes that > are >> in a separable range of inode numbers - this allows GPFS to efficiently > do >> snapshots of just that inode-space (uh... independent fileset)... >> >> And... of course the files of dependent filesets must also be > represented >> by inodes -- those inode numbers are within the inode-space of whatever >> the containing independent fileset is... as was chosen when you created >> the fileset.... If you didn't say otherwise, inodes come from the >> default "root" fileset.... >> >> Clear as your bath-water, no? >> >> So why does mmbackup care one way or another ??? Stay tuned.... >> >> BTW - if you look at the bits of the inode numbers carefully --- you may >> not immediately discern what I mean by a "separable range of inode >> numbers" -- (very technical hint) you may need to permute the bit order >> before you discern a simple pattern... >> >> >> >> From: "Luis Bolinches" >> To: gpfsug-discuss at spectrumscale.org >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 05/18/2017 02:10 AM >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope > errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi >> >> There is no direct way to convert the one fileset that is dependent to >> independent or viceversa. >> >> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >> definitions about GPFS ILM including filesets >> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >> place that is explained but I honestly believe is a good single start >> point. It also needs an update as does nto have anything on CES nor ESS, >> so anyone in this list feel free to give feedback on that page people > with >> funding decisions listen there. >> >> So you are limited to either migrate the data from that fileset to a new >> independent fileset (multiple ways to do that) or use the TSM client >> config. >> >> ----- Original message ----- >> From: "Jaime Pinto" >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: "gpfsug main discussion list" , >> "Jaime Pinto" >> Cc: >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >> Date: Thu, May 18, 2017 4:43 AM >> >> There is hope. See reference link below: >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > >> >> >> The issue has to do with dependent vs. independent filesets, something >> I didn't even realize existed until now. Our filesets are dependent >> (for no particular reason), so I have to find a way to turn them into >> independent. >> >> The proper option syntax is "--scope inodespace", and the error >> message actually flagged that out, however I didn't know how to >> interpret what I saw: >> >> >> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> -------------------------------------------------------- >> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >> 21:27:43 EDT 2017. >> -------------------------------------------------------- >> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >> fileset sysadmin3 is not supported >> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> -------------------------------------------------------- >> >> Will post the outcome. >> Jaime >> >> >> >> Quoting "Jaime Pinto" : >> >>> Quoting "Luis Bolinches" : >>> >>>> Hi >>>> >>>> have you tried to add exceptions on the TSM client config file? >>> >>> Hey Luis, >>> >>> That would work as well (mechanically), however it's not elegant or >>> efficient. When you have over 1PB and 200M files on scratch it will >>> take many hours and several helper nodes to traverse that fileset just >>> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >>> Considering that I want to keep project and sysadmin on different >>> domains then it's much worst, since we have to traverse and exclude >>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>> to capture project. >>> >>> If I have to use exclusion rules it has to rely sole on gpfs rules, and >>> somehow not traverse scratch at all. >>> >>> I suspect there is a way to do this properly, however the examples on >>> the gpfs guide and other references are not exhaustive. They only show >>> a couple of trivial cases. >>> >>> However my situation is not unique. I suspect there are may facilities >>> having to deal with backup of HUGE filesets. >>> >>> So the search is on. >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>>> >>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >> linked >>>> on /IBM/GPFS/FSET1 >>>> >>>> dsm.sys >>>> ... >>>> >>>> DOMAIN /IBM/GPFS >>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>> >>>> >>>> From: "Jaime Pinto" >>>> To: "gpfsug main discussion list" >> >>>> Date: 17-05-17 23:44 >>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>> * project3 >>>> * scratch3 >>>> * sysadmin3 >>>> >>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>> have no need or space to include *scratch3* on TSM. >>>> >>>> Question: how to craft the mmbackup command to backup >>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>> >>>> Below are 3 types of errors: >>>> >>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>> dependent fileset sysadmin3 is not supported >>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>> fileset level backup. exit 1 >>>> >>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> These examples don't really cover my case: >>>> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >> >>>> >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>>> >>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>> Oy IBM Finland Ab >>>> PL 265, 00101 Helsinki, Finland >>>> Business ID, Y-tunnus: 0195876-3 >>>> Registered in Finland >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From makaplan at us.ibm.com Thu May 18 18:05:59 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 May 2017 13:05:59 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> Message-ID: 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup wants to support incremental backups (using what it calls its shadow database) and keep both your sanity and its sanity -- so mmbackup limits you to either full filesystem or full inode-space (independent fileset.) If you want to do something else, okay, but you have to be careful and be sure of yourself. IBM will not be able to jump in and help you if and when it comes time to restore and you discover that your backup(s) were not complete. 2. If you decide you're a big boy (or woman or XXX) and want to do some hacking ... Fine... But even then, I suggest you do the smallest hack that will mostly achieve your goal... DO NOT think you can create a custom policy rules list for mmbackup out of thin air.... Capture the rules mmbackup creates and make small changes to that -- And as with any disaster recovery plan..... Plan your Test and Test your Plan.... Then do some dry run recoveries before you really "need" to do a real recovery. I only even sugest this because Jaime says he has a huge filesystem with several dependent filesets and he really, really wants to do a partial backup, without first copying or re-organizing the filesets. HMMM.... otoh... if you have one or more dependent filesets that are smallish, and/or you don't need the backups -- create independent filesets, copy/move/delete the data, rename, voila. From: "Jaime Pinto" To: "Marc A Kaplan" Cc: "gpfsug main discussion list" Date: 05/18/2017 12:36 PM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset .... > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such thing. > Filesets are kinda like little filesystems within filesystems. Moving a > file from one fileset to another requires a copy operation. There is no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > >> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always > think >> and try to read that as "inode space". >> >> An "independent fileset" has all the attributes of an (older-fashioned) >> dependent fileset PLUS all of its files are represented by inodes that > are >> in a separable range of inode numbers - this allows GPFS to efficiently > do >> snapshots of just that inode-space (uh... independent fileset)... >> >> And... of course the files of dependent filesets must also be > represented >> by inodes -- those inode numbers are within the inode-space of whatever >> the containing independent fileset is... as was chosen when you created >> the fileset.... If you didn't say otherwise, inodes come from the >> default "root" fileset.... >> >> Clear as your bath-water, no? >> >> So why does mmbackup care one way or another ??? Stay tuned.... >> >> BTW - if you look at the bits of the inode numbers carefully --- you may >> not immediately discern what I mean by a "separable range of inode >> numbers" -- (very technical hint) you may need to permute the bit order >> before you discern a simple pattern... >> >> >> >> From: "Luis Bolinches" >> To: gpfsug-discuss at spectrumscale.org >> Cc: gpfsug-discuss at spectrumscale.org >> Date: 05/18/2017 02:10 AM >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope > errors >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Hi >> >> There is no direct way to convert the one fileset that is dependent to >> independent or viceversa. >> >> I would suggest to take a look to chapter 5 of the 2014 redbook, lots of >> definitions about GPFS ILM including filesets >> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only >> place that is explained but I honestly believe is a good single start >> point. It also needs an update as does nto have anything on CES nor ESS, >> so anyone in this list feel free to give feedback on that page people > with >> funding decisions listen there. >> >> So you are limited to either migrate the data from that fileset to a new >> independent fileset (multiple ways to do that) or use the TSM client >> config. >> >> ----- Original message ----- >> From: "Jaime Pinto" >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> To: "gpfsug main discussion list" , >> "Jaime Pinto" >> Cc: >> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >> Date: Thu, May 18, 2017 4:43 AM >> >> There is hope. See reference link below: >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > >> >> >> The issue has to do with dependent vs. independent filesets, something >> I didn't even realize existed until now. Our filesets are dependent >> (for no particular reason), so I have to find a way to turn them into >> independent. >> >> The proper option syntax is "--scope inodespace", and the error >> message actually flagged that out, however I didn't know how to >> interpret what I saw: >> >> >> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >> --scope inodespace --tsm-errorlog $logfile -L 2 >> -------------------------------------------------------- >> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >> 21:27:43 EDT 2017. >> -------------------------------------------------------- >> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >> fileset sysadmin3 is not supported >> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >> fileset level backup. exit 1 >> -------------------------------------------------------- >> >> Will post the outcome. >> Jaime >> >> >> >> Quoting "Jaime Pinto" : >> >>> Quoting "Luis Bolinches" : >>> >>>> Hi >>>> >>>> have you tried to add exceptions on the TSM client config file? >>> >>> Hey Luis, >>> >>> That would work as well (mechanically), however it's not elegant or >>> efficient. When you have over 1PB and 200M files on scratch it will >>> take many hours and several helper nodes to traverse that fileset just >>> to be negated by TSM. In fact exclusion on TSM are just as inefficient. >>> Considering that I want to keep project and sysadmin on different >>> domains then it's much worst, since we have to traverse and exclude >>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>> to capture project. >>> >>> If I have to use exclusion rules it has to rely sole on gpfs rules, and >>> somehow not traverse scratch at all. >>> >>> I suspect there is a way to do this properly, however the examples on >>> the gpfs guide and other references are not exhaustive. They only show >>> a couple of trivial cases. >>> >>> However my situation is not unique. I suspect there are may facilities >>> having to deal with backup of HUGE filesets. >>> >>> So the search is on. >>> >>> Thanks >>> Jaime >>> >>> >>> >>> >>>> >>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >> linked >>>> on /IBM/GPFS/FSET1 >>>> >>>> dsm.sys >>>> ... >>>> >>>> DOMAIN /IBM/GPFS >>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>> >>>> >>>> From: "Jaime Pinto" >>>> To: "gpfsug main discussion list" >> >>>> Date: 17-05-17 23:44 >>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>> * project3 >>>> * scratch3 >>>> * sysadmin3 >>>> >>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>> have no need or space to include *scratch3* on TSM. >>>> >>>> Question: how to craft the mmbackup command to backup >>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>> >>>> Below are 3 types of errors: >>>> >>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>> dependent fileset sysadmin3 is not supported >>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>> fileset level backup. exit 1 >>>> >>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>> >>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>> cannot be specified at the same time. >>>> >>>> These examples don't really cover my case: >>>> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >> >>>> >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>>> >>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>> Oy IBM Finland Ab >>>> PL 265, 00101 Helsinki, Finland >>>> Business ID, Y-tunnus: 0195876-3 >>>> Registered in Finland >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Thu May 18 20:02:46 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Thu, 18 May 2017 15:02:46 -0400 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca>, <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> Message-ID: <20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Ok Mark I'll follow your option 2) suggestion, and capture what mmbackup is using as a rule first, then modify it. I imagine by 'capture' you are referring to the -L n level I use? -L n Controls the level of information displayed by the mmbackup command. Larger values indicate the display of more detailed information. n should be one of the following values: 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. Thanks Jaime Quoting "Marc A Kaplan" : > 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup > wants to support incremental backups (using what it calls its shadow > database) and keep both your sanity and its sanity -- so mmbackup limits > you to either full filesystem or full inode-space (independent fileset.) > If you want to do something else, okay, but you have to be careful and be > sure of yourself. IBM will not be able to jump in and help you if and when > it comes time to restore and you discover that your backup(s) were not > complete. > > 2. If you decide you're a big boy (or woman or XXX) and want to do some > hacking ... Fine... But even then, I suggest you do the smallest hack > that will mostly achieve your goal... > DO NOT think you can create a custom policy rules list for mmbackup out of > thin air.... Capture the rules mmbackup creates and make small changes to > that -- > And as with any disaster recovery plan..... Plan your Test and Test your > Plan.... Then do some dry run recoveries before you really "need" to do a > real recovery. > > I only even sugest this because Jaime says he has a huge filesystem with > several dependent filesets and he really, really wants to do a partial > backup, without first copying or re-organizing the filesets. > > HMMM.... otoh... if you have one or more dependent filesets that are > smallish, and/or you don't need the backups -- create independent > filesets, copy/move/delete the data, rename, voila. > > > > From: "Jaime Pinto" > To: "Marc A Kaplan" > Cc: "gpfsug main discussion list" > Date: 05/18/2017 12:36 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Marc > > The -P option may be a very good workaround, but I still have to test it. > > I'm currently trying to craft the mm rule, as minimalist as possible, > however I'm not sure about what attributes mmbackup expects to see. > > Below is my first attempt. It would be nice to get comments from > somebody familiar with the inner works of mmbackup. > > Thanks > Jaime > > > /* A macro to abbreviate VARCHAR */ > define([vc],[VARCHAR($1)]) > > /* Define three external lists */ > RULE EXTERNAL LIST 'allfiles' EXEC > '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' > > /* Generate a list of all files, directories, plus all other file > system objects, > like symlinks, named pipes, etc. Include the owner's id with each > object and > sort them by the owner's id */ > > RULE 'r1' LIST 'allfiles' > DIRECTORIES_PLUS > SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || > vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) > FROM POOL 'system' > FOR FILESET('sysadmin3') > > /* Files in special filesets, such as those excluded, are never traversed > */ > RULE 'ExcSpecialFile' EXCLUDE > FOR FILESET('scratch3','project3') > > > > > > Quoting "Marc A Kaplan" : > >> Jaime, >> >> While we're waiting for the mmbackup expert to weigh in, notice that > the >> mmbackup command does have a -P option that allows you to provide a >> customized policy rules file. >> >> So... a fairly safe hack is to do a trial mmbackup run, capture the >> automatically generated policy file, and then augment it with FOR >> FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup > for >> real with your customized policy file. >> >> mmbackup uses mmapplypolicy which by itself is happy to limit its >> directory scan to a particular fileset by using >> >> mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope >> fileset .... >> >> However, mmbackup probably has other worries and for simpliciity and >> helping make sure you get complete, sensible backups, apparently has >> imposed some restrictions to preserve sanity (yours and our support > team! >> ;-) ) ... (For example, suppose you were doing incremental backups, >> starting at different paths each time? -- happy to do so, but when >> disaster strikes and you want to restore -- you'll end up confused > and/or >> unhappy!) >> >> "converting from one fileset to another" --- sorry there is no such > thing. >> Filesets are kinda like little filesystems within filesystems. Moving > a >> file from one fileset to another requires a copy operation. There is > no >> fast move nor hardlinking. >> >> --marc >> >> >> >> From: "Jaime Pinto" >> To: "gpfsug main discussion list" > , >> "Marc A Kaplan" >> Date: 05/18/2017 09:58 AM >> Subject: Re: [gpfsug-discuss] What is an independent fileset? > was: >> mmbackup with fileset : scope errors >> >> >> >> Thanks for the explanation Mark and Luis, >> >> It begs the question: why filesets are created as dependent by >> default, if the adverse repercussions can be so great afterward? Even >> in my case, where I manage GPFS and TSM deployments (and I have been >> around for a while), didn't realize at all that not adding and extra >> option at fileset creation time would cause me huge trouble with >> scaling later on as I try to use mmbackup. >> >> When you have different groups to manage file systems and backups that >> don't read each-other's manuals ahead of time then we have a really >> bad recipe. >> >> I'm looking forward to your explanation as to why mmbackup cares one >> way or another. >> >> I'm also hoping for a hint as to how to configure backup exclusion >> rules on the TSM side to exclude fileset traversing on the GPFS side. >> Is mmbackup smart enough (actually smarter than TSM client itself) to >> read the exclusion rules on the TSM configuration and apply them >> before traversing? >> >> Thanks >> Jaime >> >> Quoting "Marc A Kaplan" : >> >>> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always >> think >>> and try to read that as "inode space". >>> >>> An "independent fileset" has all the attributes of an (older-fashioned) >>> dependent fileset PLUS all of its files are represented by inodes that >> are >>> in a separable range of inode numbers - this allows GPFS to efficiently >> do >>> snapshots of just that inode-space (uh... independent fileset)... >>> >>> And... of course the files of dependent filesets must also be >> represented >>> by inodes -- those inode numbers are within the inode-space of whatever >>> the containing independent fileset is... as was chosen when you created >>> the fileset.... If you didn't say otherwise, inodes come from the >>> default "root" fileset.... >>> >>> Clear as your bath-water, no? >>> >>> So why does mmbackup care one way or another ??? Stay tuned.... >>> >>> BTW - if you look at the bits of the inode numbers carefully --- you > may >>> not immediately discern what I mean by a "separable range of inode >>> numbers" -- (very technical hint) you may need to permute the bit order >>> before you discern a simple pattern... >>> >>> >>> >>> From: "Luis Bolinches" >>> To: gpfsug-discuss at spectrumscale.org >>> Cc: gpfsug-discuss at spectrumscale.org >>> Date: 05/18/2017 02:10 AM >>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope >> errors >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> >>> >>> >>> Hi >>> >>> There is no direct way to convert the one fileset that is dependent to >>> independent or viceversa. >>> >>> I would suggest to take a look to chapter 5 of the 2014 redbook, lots > of >>> definitions about GPFS ILM including filesets >>> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the > only >>> place that is explained but I honestly believe is a good single start >>> point. It also needs an update as does nto have anything on CES nor > ESS, >>> so anyone in this list feel free to give feedback on that page people >> with >>> funding decisions listen there. >>> >>> So you are limited to either migrate the data from that fileset to a > new >>> independent fileset (multiple ways to do that) or use the TSM client >>> config. >>> >>> ----- Original message ----- >>> From: "Jaime Pinto" >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> To: "gpfsug main discussion list" , >>> "Jaime Pinto" >>> Cc: >>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>> Date: Thu, May 18, 2017 4:43 AM >>> >>> There is hope. See reference link below: >>> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > >> >>> >>> >>> The issue has to do with dependent vs. independent filesets, something >>> I didn't even realize existed until now. Our filesets are dependent >>> (for no particular reason), so I have to find a way to turn them into >>> independent. >>> >>> The proper option syntax is "--scope inodespace", and the error >>> message actually flagged that out, however I didn't know how to >>> interpret what I saw: >>> >>> >>> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>> --scope inodespace --tsm-errorlog $logfile -L 2 >>> -------------------------------------------------------- >>> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >>> 21:27:43 EDT 2017. >>> -------------------------------------------------------- >>> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >>> fileset sysadmin3 is not supported >>> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >>> fileset level backup. exit 1 >>> -------------------------------------------------------- >>> >>> Will post the outcome. >>> Jaime >>> >>> >>> >>> Quoting "Jaime Pinto" : >>> >>>> Quoting "Luis Bolinches" : >>>> >>>>> Hi >>>>> >>>>> have you tried to add exceptions on the TSM client config file? >>>> >>>> Hey Luis, >>>> >>>> That would work as well (mechanically), however it's not elegant or >>>> efficient. When you have over 1PB and 200M files on scratch it will >>>> take many hours and several helper nodes to traverse that fileset just >>>> to be negated by TSM. In fact exclusion on TSM are just as > inefficient. >>>> Considering that I want to keep project and sysadmin on different >>>> domains then it's much worst, since we have to traverse and exclude >>>> scratch & (project|sysadmin) twice, once to capture sysadmin and again >>>> to capture project. >>>> >>>> If I have to use exclusion rules it has to rely sole on gpfs rules, > and >>>> somehow not traverse scratch at all. >>>> >>>> I suspect there is a way to do this properly, however the examples on >>>> the gpfs guide and other references are not exhaustive. They only show >>>> a couple of trivial cases. >>>> >>>> However my situation is not unique. I suspect there are may facilities >>>> having to deal with backup of HUGE filesets. >>>> >>>> So the search is on. >>>> >>>> Thanks >>>> Jaime >>>> >>>> >>>> >>>> >>>>> >>>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >>> linked >>>>> on /IBM/GPFS/FSET1 >>>>> >>>>> dsm.sys >>>>> ... >>>>> >>>>> DOMAIN /IBM/GPFS >>>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>>> >>>>> >>>>> From: "Jaime Pinto" >>>>> To: "gpfsug main discussion list" >>> >>>>> Date: 17-05-17 23:44 >>>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors >>>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>>> >>>>> >>>>> >>>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>>> * project3 >>>>> * scratch3 >>>>> * sysadmin3 >>>>> >>>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>>> have no need or space to include *scratch3* on TSM. >>>>> >>>>> Question: how to craft the mmbackup command to backup >>>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>>> >>>>> Below are 3 types of errors: >>>>> >>>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>>> cannot be specified at the same time. >>>>> >>>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>>> dependent fileset sysadmin3 is not supported >>>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>>> fileset level backup. exit 1 >>>>> >>>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>>> >>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem >>>>> cannot be specified at the same time. >>>>> >>>>> These examples don't really cover my case: >>>>> >>> >> > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > >> >>> >>>>> >>>>> >>>>> Thanks >>>>> Jaime >>>>> >>>>> >>>>> ************************************ >>>>> TELL US ABOUT YOUR SUCCESS STORIES >>>>> http://www.scinethpc.ca/testimonials >>>>> ************************************ >>>>> --- >>>>> Jaime Pinto >>>>> SciNet HPC Consortium - Compute/Calcul Canada >>>>> www.scinet.utoronto.ca - www.computecanada.ca >>>>> University of Toronto >>>>> 661 University Ave. (MaRS), Suite 1140 >>>>> Toronto, ON, M5G1M1 >>>>> P: 416-978-2755 >>>>> C: 416-505-1477 >>>>> >>>>> ---------------------------------------------------------------- >>>>> This message was sent using IMP at SciNet Consortium, University of >>>>> Toronto. >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>>> Oy IBM Finland Ab >>>>> PL 265, 00101 Helsinki, Finland >>>>> Business ID, Y-tunnus: 0195876-3 >>>>> Registered in Finland >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>> Oy IBM Finland Ab >>> PL 265, 00101 Helsinki, Finland >>> Business ID, Y-tunnus: 0195876-3 >>> Registered in Finland >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From jtucker at pixitmedia.com Thu May 18 20:32:54 2017 From: jtucker at pixitmedia.com (Jez Tucker) Date: Thu, 18 May 2017 20:32:54 +0100 Subject: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca> <20170517164447.183018kfjbol2jwf@support.scinet.utoronto.ca> <20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca> <20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca> <20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca> <20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Message-ID: Hi When mmbackup has passed the preflight stage (pretty quickly) you'll find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* Best, Jez On 18/05/17 20:02, Jaime Pinto wrote: > Ok Mark > > I'll follow your option 2) suggestion, and capture what mmbackup is > using as a rule first, then modify it. > > I imagine by 'capture' you are referring to the -L n level I use? > > -L n > Controls the level of information displayed by the > mmbackup command. Larger values indicate the > display of more detailed information. n should be one of > the following values: > > 3 > Displays the same information as 2, plus each > candidate file and the applicable rule. > > 4 > Displays the same information as 3, plus each > explicitly EXCLUDEed or LISTed > file, and the applicable rule. > > 5 > Displays the same information as 4, plus the > attributes of candidate and EXCLUDEed or > LISTed files. > > 6 > Displays the same information as 5, plus > non-candidate files and their attributes. > > Thanks > Jaime > > > > > Quoting "Marc A Kaplan" : > >> 1. As I surmised, and I now have verification from Mr. mmbackup, >> mmbackup >> wants to support incremental backups (using what it calls its shadow >> database) and keep both your sanity and its sanity -- so mmbackup limits >> you to either full filesystem or full inode-space (independent fileset.) >> If you want to do something else, okay, but you have to be careful >> and be >> sure of yourself. IBM will not be able to jump in and help you if and >> when >> it comes time to restore and you discover that your backup(s) were not >> complete. >> >> 2. If you decide you're a big boy (or woman or XXX) and want to do some >> hacking ... Fine... But even then, I suggest you do the smallest hack >> that will mostly achieve your goal... >> DO NOT think you can create a custom policy rules list for mmbackup >> out of >> thin air.... Capture the rules mmbackup creates and make small >> changes to >> that -- >> And as with any disaster recovery plan..... Plan your Test and Test >> your >> Plan.... Then do some dry run recoveries before you really "need" to >> do a >> real recovery. >> >> I only even sugest this because Jaime says he has a huge filesystem with >> several dependent filesets and he really, really wants to do a partial >> backup, without first copying or re-organizing the filesets. >> >> HMMM.... otoh... if you have one or more dependent filesets that are >> smallish, and/or you don't need the backups -- create independent >> filesets, copy/move/delete the data, rename, voila. >> >> >> >> From: "Jaime Pinto" >> To: "Marc A Kaplan" >> Cc: "gpfsug main discussion list" >> Date: 05/18/2017 12:36 PM >> Subject: Re: [gpfsug-discuss] What is an independent fileset? >> was: >> mmbackup with fileset : scope errors >> >> >> >> Marc >> >> The -P option may be a very good workaround, but I still have to test >> it. >> >> I'm currently trying to craft the mm rule, as minimalist as possible, >> however I'm not sure about what attributes mmbackup expects to see. >> >> Below is my first attempt. It would be nice to get comments from >> somebody familiar with the inner works of mmbackup. >> >> Thanks >> Jaime >> >> >> /* A macro to abbreviate VARCHAR */ >> define([vc],[VARCHAR($1)]) >> >> /* Define three external lists */ >> RULE EXTERNAL LIST 'allfiles' EXEC >> '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' >> >> /* Generate a list of all files, directories, plus all other file >> system objects, >> like symlinks, named pipes, etc. Include the owner's id with each >> object and >> sort them by the owner's id */ >> >> RULE 'r1' LIST 'allfiles' >> DIRECTORIES_PLUS >> SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || >> vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) >> FROM POOL 'system' >> FOR FILESET('sysadmin3') >> >> /* Files in special filesets, such as those excluded, are never >> traversed >> */ >> RULE 'ExcSpecialFile' EXCLUDE >> FOR FILESET('scratch3','project3') >> >> >> >> >> >> Quoting "Marc A Kaplan" : >> >>> Jaime, >>> >>> While we're waiting for the mmbackup expert to weigh in, notice that >> the >>> mmbackup command does have a -P option that allows you to provide a >>> customized policy rules file. >>> >>> So... a fairly safe hack is to do a trial mmbackup run, capture the >>> automatically generated policy file, and then augment it with FOR >>> FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup >> for >>> real with your customized policy file. >>> >>> mmbackup uses mmapplypolicy which by itself is happy to limit its >>> directory scan to a particular fileset by using >>> >>> mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope >>> fileset .... >>> >>> However, mmbackup probably has other worries and for simpliciity and >>> helping make sure you get complete, sensible backups, apparently has >>> imposed some restrictions to preserve sanity (yours and our support >> team! >>> ;-) ) ... (For example, suppose you were doing incremental backups, >>> starting at different paths each time? -- happy to do so, but when >>> disaster strikes and you want to restore -- you'll end up confused >> and/or >>> unhappy!) >>> >>> "converting from one fileset to another" --- sorry there is no such >> thing. >>> Filesets are kinda like little filesystems within filesystems. Moving >> a >>> file from one fileset to another requires a copy operation. There is >> no >>> fast move nor hardlinking. >>> >>> --marc >>> >>> >>> >>> From: "Jaime Pinto" >>> To: "gpfsug main discussion list" >> , >>> "Marc A Kaplan" >>> Date: 05/18/2017 09:58 AM >>> Subject: Re: [gpfsug-discuss] What is an independent fileset? >> was: >>> mmbackup with fileset : scope errors >>> >>> >>> >>> Thanks for the explanation Mark and Luis, >>> >>> It begs the question: why filesets are created as dependent by >>> default, if the adverse repercussions can be so great afterward? Even >>> in my case, where I manage GPFS and TSM deployments (and I have been >>> around for a while), didn't realize at all that not adding and extra >>> option at fileset creation time would cause me huge trouble with >>> scaling later on as I try to use mmbackup. >>> >>> When you have different groups to manage file systems and backups that >>> don't read each-other's manuals ahead of time then we have a really >>> bad recipe. >>> >>> I'm looking forward to your explanation as to why mmbackup cares one >>> way or another. >>> >>> I'm also hoping for a hint as to how to configure backup exclusion >>> rules on the TSM side to exclude fileset traversing on the GPFS side. >>> Is mmbackup smart enough (actually smarter than TSM client itself) to >>> read the exclusion rules on the TSM configuration and apply them >>> before traversing? >>> >>> Thanks >>> Jaime >>> >>> Quoting "Marc A Kaplan" : >>> >>>> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always >>> think >>>> and try to read that as "inode space". >>>> >>>> An "independent fileset" has all the attributes of an >>>> (older-fashioned) >>>> dependent fileset PLUS all of its files are represented by inodes that >>> are >>>> in a separable range of inode numbers - this allows GPFS to >>>> efficiently >>> do >>>> snapshots of just that inode-space (uh... independent fileset)... >>>> >>>> And... of course the files of dependent filesets must also be >>> represented >>>> by inodes -- those inode numbers are within the inode-space of >>>> whatever >>>> the containing independent fileset is... as was chosen when you >>>> created >>>> the fileset.... If you didn't say otherwise, inodes come from the >>>> default "root" fileset.... >>>> >>>> Clear as your bath-water, no? >>>> >>>> So why does mmbackup care one way or another ??? Stay tuned.... >>>> >>>> BTW - if you look at the bits of the inode numbers carefully --- you >> may >>>> not immediately discern what I mean by a "separable range of inode >>>> numbers" -- (very technical hint) you may need to permute the bit >>>> order >>>> before you discern a simple pattern... >>>> >>>> >>>> >>>> From: "Luis Bolinches" >>>> To: gpfsug-discuss at spectrumscale.org >>>> Cc: gpfsug-discuss at spectrumscale.org >>>> Date: 05/18/2017 02:10 AM >>>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope >>> errors >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> >>>> >>>> >>>> Hi >>>> >>>> There is no direct way to convert the one fileset that is dependent to >>>> independent or viceversa. >>>> >>>> I would suggest to take a look to chapter 5 of the 2014 redbook, lots >> of >>>> definitions about GPFS ILM including filesets >>>> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the >> only >>>> place that is explained but I honestly believe is a good single start >>>> point. It also needs an update as does nto have anything on CES nor >> ESS, >>>> so anyone in this list feel free to give feedback on that page people >>> with >>>> funding decisions listen there. >>>> >>>> So you are limited to either migrate the data from that fileset to a >> new >>>> independent fileset (multiple ways to do that) or use the TSM client >>>> config. >>>> >>>> ----- Original message ----- >>>> From: "Jaime Pinto" >>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>> To: "gpfsug main discussion list" , >>>> "Jaime Pinto" >>>> Cc: >>>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors >>>> Date: Thu, May 18, 2017 4:43 AM >>>> >>>> There is hope. See reference link below: >>>> >>> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm >> >> >>> >>>> >>>> >>>> The issue has to do with dependent vs. independent filesets, something >>>> I didn't even realize existed until now. Our filesets are dependent >>>> (for no particular reason), so I have to find a way to turn them into >>>> independent. >>>> >>>> The proper option syntax is "--scope inodespace", and the error >>>> message actually flagged that out, however I didn't know how to >>>> interpret what I saw: >>>> >>>> >>>> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>> -------------------------------------------------------- >>>> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 >>>> 21:27:43 EDT 2017. >>>> -------------------------------------------------------- >>>> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* >>>> fileset sysadmin3 is not supported >>>> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for >>>> fileset level backup. exit 1 >>>> -------------------------------------------------------- >>>> >>>> Will post the outcome. >>>> Jaime >>>> >>>> >>>> >>>> Quoting "Jaime Pinto" : >>>> >>>>> Quoting "Luis Bolinches" : >>>>> >>>>>> Hi >>>>>> >>>>>> have you tried to add exceptions on the TSM client config file? >>>>> >>>>> Hey Luis, >>>>> >>>>> That would work as well (mechanically), however it's not elegant or >>>>> efficient. When you have over 1PB and 200M files on scratch it will >>>>> take many hours and several helper nodes to traverse that fileset >>>>> just >>>>> to be negated by TSM. In fact exclusion on TSM are just as >> inefficient. >>>>> Considering that I want to keep project and sysadmin on different >>>>> domains then it's much worst, since we have to traverse and exclude >>>>> scratch & (project|sysadmin) twice, once to capture sysadmin and >>>>> again >>>>> to capture project. >>>>> >>>>> If I have to use exclusion rules it has to rely sole on gpfs rules, >> and >>>>> somehow not traverse scratch at all. >>>>> >>>>> I suspect there is a way to do this properly, however the examples on >>>>> the gpfs guide and other references are not exhaustive. They only >>>>> show >>>>> a couple of trivial cases. >>>>> >>>>> However my situation is not unique. I suspect there are may >>>>> facilities >>>>> having to deal with backup of HUGE filesets. >>>>> >>>>> So the search is on. >>>>> >>>>> Thanks >>>>> Jaime >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is >>>> linked >>>>>> on /IBM/GPFS/FSET1 >>>>>> >>>>>> dsm.sys >>>>>> ... >>>>>> >>>>>> DOMAIN /IBM/GPFS >>>>>> EXCLUDE.DIR /IBM/GPFS/FSET1 >>>>>> >>>>>> >>>>>> From: "Jaime Pinto" >>>>>> To: "gpfsug main discussion list" >>>> >>>>>> Date: 17-05-17 23:44 >>>>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope >>>>>> errors >>>>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>>>>> >>>>>> >>>>>> >>>>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: >>>>>> * project3 >>>>>> * scratch3 >>>>>> * sysadmin3 >>>>>> >>>>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we >>>>>> have no need or space to include *scratch3* on TSM. >>>>>> >>>>>> Question: how to craft the mmbackup command to backup >>>>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? >>>>>> >>>>>> Below are 3 types of errors: >>>>>> >>>>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>>> --tsm-errorlog $logfile -L 2 >>>>>> >>>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope >>>>>> filesystem >>>>>> cannot be specified at the same time. >>>>>> >>>>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>>> --scope inodespace --tsm-errorlog $logfile -L 2 >>>>>> >>>>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up >>>>>> dependent fileset sysadmin3 is not supported >>>>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for >>>>>> fileset level backup. exit 1 >>>>>> >>>>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm >>>>>> --scope filesystem --tsm-errorlog $logfile -L 2 >>>>>> >>>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope >>>>>> filesystem >>>>>> cannot be specified at the same time. >>>>>> >>>>>> These examples don't really cover my case: >>>>>> >>>> >>> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples >> >> >>> >>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> Jaime >>>>>> >>>>>> >>>>>> ************************************ >>>>>> TELL US ABOUT YOUR SUCCESS STORIES >>>>>> http://www.scinethpc.ca/testimonials >>>>>> ************************************ >>>>>> --- >>>>>> Jaime Pinto >>>>>> SciNet HPC Consortium - Compute/Calcul Canada >>>>>> www.scinet.utoronto.ca - www.computecanada.ca >>>>>> University of Toronto >>>>>> 661 University Ave. (MaRS), Suite 1140 >>>>>> Toronto, ON, M5G1M1 >>>>>> P: 416-978-2755 >>>>>> C: 416-505-1477 >>>>>> >>>>>> ---------------------------------------------------------------- >>>>>> This message was sent using IMP at SciNet Consortium, University of >>>>>> Toronto. >>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss at spectrumscale.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>>>> Oy IBM Finland Ab >>>>>> PL 265, 00101 Helsinki, Finland >>>>>> Business ID, Y-tunnus: 0195876-3 >>>>>> Registered in Finland >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ************************************ >>>>> TELL US ABOUT YOUR SUCCESS STORIES >>>>> http://www.scinethpc.ca/testimonials >>>>> ************************************ >>>>> --- >>>>> Jaime Pinto >>>>> SciNet HPC Consortium - Compute/Calcul Canada >>>>> www.scinet.utoronto.ca - www.computecanada.ca >>>>> University of Toronto >>>>> 661 University Ave. (MaRS), Suite 1140 >>>>> Toronto, ON, M5G1M1 >>>>> P: 416-978-2755 >>>>> C: 416-505-1477 >>>>> >>>>> ---------------------------------------------------------------- >>>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ************************************ >>>> TELL US ABOUT YOUR SUCCESS STORIES >>>> http://www.scinethpc.ca/testimonials >>>> ************************************ >>>> --- >>>> Jaime Pinto >>>> SciNet HPC Consortium - Compute/Calcul Canada >>>> www.scinet.utoronto.ca - www.computecanada.ca >>>> University of Toronto >>>> 661 University Ave. (MaRS), Suite 1140 >>>> Toronto, ON, M5G1M1 >>>> P: 416-978-2755 >>>> C: 416-505-1477 >>>> >>>> ---------------------------------------------------------------- >>>> This message was sent using IMP at SciNet Consortium, University of >>>> Toronto. >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> Ellei edell? ole toisin mainittu: / Unless stated otherwise above: >>>> Oy IBM Finland Ab >>>> PL 265, 00101 Helsinki, Finland >>>> Business ID, Y-tunnus: 0195876-3 >>>> Registered in Finland >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >>> >>> ************************************ >>> TELL US ABOUT YOUR SUCCESS STORIES >>> http://www.scinethpc.ca/testimonials >>> ************************************ >>> --- >>> Jaime Pinto >>> SciNet HPC Consortium - Compute/Calcul Canada >>> www.scinet.utoronto.ca - www.computecanada.ca >>> University of Toronto >>> 661 University Ave. (MaRS), Suite 1140 >>> Toronto, ON, M5G1M1 >>> P: 416-978-2755 >>> C: 416-505-1477 >>> >>> ---------------------------------------------------------------- >>> This message was sent using IMP at SciNet Consortium, University of >>> Toronto. >>> >>> >>> >>> >>> >>> >> >> >> >> >> >> >> ************************************ >> TELL US ABOUT YOUR SUCCESS STORIES >> http://www.scinethpc.ca/testimonials >> ************************************ >> --- >> Jaime Pinto >> SciNet HPC Consortium - Compute/Calcul Canada >> www.scinet.utoronto.ca - www.computecanada.ca >> University of Toronto >> 661 University Ave. (MaRS), Suite 1140 >> Toronto, ON, M5G1M1 >> P: 416-978-2755 >> C: 416-505-1477 >> >> ---------------------------------------------------------------- >> This message was sent using IMP at SciNet Consortium, University of >> Toronto. >> >> >> >> >> >> > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- *Jez Tucker* Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sfadden at us.ibm.com Thu May 18 22:46:49 2017 From: sfadden at us.ibm.com (Scott Fadden) Date: Thu, 18 May 2017 21:46:49 +0000 Subject: [gpfsug-discuss] Introduction In-Reply-To: Message-ID: Welcome! On May 17, 2017, 4:27:15 AM, neil.wilson at metoffice.gov.uk wrote: From: neil.wilson at metoffice.gov.uk To: gpfsug-discuss at spectrumscale.org Cc: Date: May 17, 2017 4:27:15 AM Subject: [gpfsug-discuss] Introduction Hi All, I help to run a gpfs cluster at the Met Office, Exeter, UK. The cluster is running GPFS 4.2.2.2, it?s used with slurm for batch work - primarily for postprocessing weather and climate change model data generated from our HPC. We currently have 8 NSD nodes with approx 3PB of storage with 70+ client nodes. Kind Regards Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu May 18 22:55:34 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 18 May 2017 21:55:34 +0000 Subject: [gpfsug-discuss] RPM Packages Message-ID: Hi All, Normally we never use the install toolkit, but deploy GPFS from a config management tool. I see there are now RPMs such as gpfs.license.dm, are these actually required to be installed? Everything seems to work well without them, so just interested. Maybe the GUI uses them? Thanks Simon From makaplan at us.ibm.com Fri May 19 14:50:20 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 19 May 2017 09:50:20 -0400 Subject: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca><20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca><20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Message-ID: Easier than hacking mmbackup or writing/editing policy rules, mmbackup interprets your TSM INCLUDE/EXCLUDE configuration statements -- so that is a supported and recommended way of doing business... If that doesn't do it for your purposes... You're into some light hacking... So look inside the mmbackup and tsbackup33 scripts and you'll find some DEBUG variables that should allow for keeping work and temp files around ... including the generated policy rules. I'm calling this hacking "light", because I don't think you'll need to change the scripts, but just look around and see how you can use what's there to achieve your legitimate purposes. Even so, you will have crossed a line where IBM support is "informal" at best. From: Jez Tucker To: gpfsug-discuss at spectrumscale.org Date: 05/18/2017 03:33 PM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi When mmbackup has passed the preflight stage (pretty quickly) you'll find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* Best, Jez On 18/05/17 20:02, Jaime Pinto wrote: Ok Mark I'll follow your option 2) suggestion, and capture what mmbackup is using as a rule first, then modify it. I imagine by 'capture' you are referring to the -L n level I use? -L n Controls the level of information displayed by the mmbackup command. Larger values indicate the display of more detailed information. n should be one of the following values: 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. Thanks Jaime Quoting "Marc A Kaplan" : 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup wants to support incremental backups (using what it calls its shadow database) and keep both your sanity and its sanity -- so mmbackup limits you to either full filesystem or full inode-space (independent fileset.) If you want to do something else, okay, but you have to be careful and be sure of yourself. IBM will not be able to jump in and help you if and when it comes time to restore and you discover that your backup(s) were not complete. 2. If you decide you're a big boy (or woman or XXX) and want to do some hacking ... Fine... But even then, I suggest you do the smallest hack that will mostly achieve your goal... DO NOT think you can create a custom policy rules list for mmbackup out of thin air.... Capture the rules mmbackup creates and make small changes to that -- And as with any disaster recovery plan..... Plan your Test and Test your Plan.... Then do some dry run recoveries before you really "need" to do a real recovery. I only even sugest this because Jaime says he has a huge filesystem with several dependent filesets and he really, really wants to do a partial backup, without first copying or re-organizing the filesets. HMMM.... otoh... if you have one or more dependent filesets that are smallish, and/or you don't need the backups -- create independent filesets, copy/move/delete the data, rename, voila. From: "Jaime Pinto" To: "Marc A Kaplan" Cc: "gpfsug main discussion list" Date: 05/18/2017 12:36 PM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" : Jaime, While we're waiting for the mmbackup expert to weigh in, notice that the mmbackup command does have a -P option that allows you to provide a customized policy rules file. So... a fairly safe hack is to do a trial mmbackup run, capture the automatically generated policy file, and then augment it with FOR FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup for real with your customized policy file. mmbackup uses mmapplypolicy which by itself is happy to limit its directory scan to a particular fileset by using mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope fileset .... However, mmbackup probably has other worries and for simpliciity and helping make sure you get complete, sensible backups, apparently has imposed some restrictions to preserve sanity (yours and our support team! ;-) ) ... (For example, suppose you were doing incremental backups, starting at different paths each time? -- happy to do so, but when disaster strikes and you want to restore -- you'll end up confused and/or unhappy!) "converting from one fileset to another" --- sorry there is no such thing. Filesets are kinda like little filesystems within filesystems. Moving a file from one fileset to another requires a copy operation. There is no fast move nor hardlinking. --marc From: "Jaime Pinto" To: "gpfsug main discussion list" , "Marc A Kaplan" Date: 05/18/2017 09:58 AM Subject: Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" : When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset.... If you didn't say otherwise, inodes come from the default "root" fileset.... Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned.... BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers" -- (very technical hint) you may need to permute the bit order before you discern a simple pattern... From: "Luis Bolinches" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 05/18/2017 02:10 AM Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi There is no direct way to convert the one fileset that is dependent to independent or viceversa. I would suggest to take a look to chapter 5 of the 2014 redbook, lots of definitions about GPFS ILM including filesets http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only place that is explained but I honestly believe is a good single start point. It also needs an update as does nto have anything on CES nor ESS, so anyone in this list feel free to give feedback on that page people with funding decisions listen there. So you are limited to either migrate the data from that fileset to a new independent fileset (multiple ways to do that) or use the TSM client config. ----- Original message ----- From: "Jaime Pinto" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug main discussion list" , "Jaime Pinto" Cc: Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Date: Thu, May 18, 2017 4:43 AM There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 -------------------------------------------------------- mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. -------------------------------------------------------- Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 -------------------------------------------------------- Will post the outcome. Jaime Quoting "Jaime Pinto" : Quoting "Luis Bolinches" : Hi have you tried to add exceptions on the TSM client config file? Hey Luis, That would work as well (mechanically), however it's not elegant or efficient. When you have over 1PB and 200M files on scratch it will take many hours and several helper nodes to traverse that fileset just to be negated by TSM. In fact exclusion on TSM are just as inefficient. Considering that I want to keep project and sysadmin on different domains then it's much worst, since we have to traverse and exclude scratch & (project|sysadmin) twice, once to capture sysadmin and again to capture project. If I have to use exclusion rules it has to rely sole on gpfs rules, and somehow not traverse scratch at all. I suspect there is a way to do this properly, however the examples on the gpfs guide and other references are not exhaustive. They only show a couple of trivial cases. However my situation is not unique. I suspect there are may facilities having to deal with backup of HUGE filesets. So the search is on. Thanks Jaime Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked on /IBM/GPFS/FSET1 dsm.sys ... DOMAIN /IBM/GPFS EXCLUDE.DIR /IBM/GPFS/FSET1 From: "Jaime Pinto" To: "gpfsug main discussion list" Date: 17-05-17 23:44 Subject: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by: gpfsug-discuss-bounces at spectrumscale.org I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: * project3 * scratch3 * sysadmin3 I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we have no need or space to include *scratch3* on TSM. Question: how to craft the mmbackup command to backup /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? Below are 3 types of errors: 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up dependent fileset sysadmin3 is not supported Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope filesystem --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. These examples don't really cover my case: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples Thanks Jaime ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Jez Tucker Head of Research and Development, Pixit Media 07764193820 | jtucker at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Fri May 19 17:12:20 2017 From: Mark.Bush at siriuscom.com (Mark Bush) Date: Fri, 19 May 2017 16:12:20 +0000 Subject: [gpfsug-discuss] RPM Packages Message-ID: For what it?s worth, I have been running 4.2.3 DM for a few weeks in a test lab and didn?t have the gpfs.license.dm package installed and everything worked fine (GUI, CES, etc, etc). Here?s what rpm says about itself [root at node1 ~]# rpm -qpl gpfs.license.dm-4.2.3-0.x86_64.rpm /usr/lpp/mmfs /usr/lpp/mmfs/bin /usr/lpp/mmfs/properties/version/ibm.com_IBM_Spectrum_Scale_Data_Mgmt_Edition-4.2.3.swidtag This file seems to be some XML code with strings of numbers in it. Not sure what it does for you. Mark On 5/18/17, 4:55 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (IT Research Support)" wrote: Hi All, Normally we never use the install toolkit, but deploy GPFS from a config management tool. I see there are now RPMs such as gpfs.license.dm, are these actually required to be installed? Everything seems to work well without them, so just interested. Maybe the GUI uses them? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions From jonathon.anderson at colorado.edu Fri May 19 17:16:50 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Fri, 19 May 2017 16:16:50 +0000 Subject: [gpfsug-discuss] RPM Packages In-Reply-To: References: Message-ID: Data Management Edition optionally replaces the traditional GPFS licensing model with a per-terabyte licensing fee, rather than a per-socket licensing fee. https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS216-158 Presumably installing this RPM is how you tell GPFS which licensing model you?re using. ~jonathon On 5/19/17, 10:12 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mark Bush" wrote: For what it?s worth, I have been running 4.2.3 DM for a few weeks in a test lab and didn?t have the gpfs.license.dm package installed and everything worked fine (GUI, CES, etc, etc). Here?s what rpm says about itself [root at node1 ~]# rpm -qpl gpfs.license.dm-4.2.3-0.x86_64.rpm /usr/lpp/mmfs /usr/lpp/mmfs/bin /usr/lpp/mmfs/properties/version/ibm.com_IBM_Spectrum_Scale_Data_Mgmt_Edition-4.2.3.swidtag This file seems to be some XML code with strings of numbers in it. Not sure what it does for you. Mark On 5/18/17, 4:55 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (IT Research Support)" wrote: Hi All, Normally we never use the install toolkit, but deploy GPFS from a config management tool. I see there are now RPMs such as gpfs.license.dm, are these actually required to be installed? Everything seems to work well without them, so just interested. Maybe the GUI uses them? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Fri May 19 17:43:49 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Fri, 19 May 2017 16:43:49 +0000 Subject: [gpfsug-discuss] RPM Packages In-Reply-To: References: , Message-ID: Well, I installed it one node and it still claims that it's advanced licensed on the node (only after installing gpfs.adv of course). I know the license model for DME, but we've never installed the gpfs.license.standard packages before. I agree the XML string pro ably is used somewhere, just not clear if it's needed or not... My guess would be maybe the GUI uses it. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jonathon A Anderson [jonathon.anderson at colorado.edu] Sent: 19 May 2017 17:16 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] RPM Packages Data Management Edition optionally replaces the traditional GPFS licensing model with a per-terabyte licensing fee, rather than a per-socket licensing fee. https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS216-158 Presumably installing this RPM is how you tell GPFS which licensing model you?re using. ~jonathon On 5/19/17, 10:12 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mark Bush" wrote: For what it?s worth, I have been running 4.2.3 DM for a few weeks in a test lab and didn?t have the gpfs.license.dm package installed and everything worked fine (GUI, CES, etc, etc). Here?s what rpm says about itself [root at node1 ~]# rpm -qpl gpfs.license.dm-4.2.3-0.x86_64.rpm /usr/lpp/mmfs /usr/lpp/mmfs/bin /usr/lpp/mmfs/properties/version/ibm.com_IBM_Spectrum_Scale_Data_Mgmt_Edition-4.2.3.swidtag This file seems to be some XML code with strings of numbers in it. Not sure what it does for you. Mark On 5/18/17, 4:55 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (IT Research Support)" wrote: Hi All, Normally we never use the install toolkit, but deploy GPFS from a config management tool. I see there are now RPMs such as gpfs.license.dm, are these actually required to be installed? Everything seems to work well without them, so just interested. Maybe the GUI uses them? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From tpathare at sidra.org Sun May 21 09:40:42 2017 From: tpathare at sidra.org (Tushar Pathare) Date: Sun, 21 May 2017 08:40:42 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue Message-ID: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sun May 21 09:59:38 2017 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Sun, 21 May 2017 08:59:38 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> Message-ID: Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that. Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log). -Aaron On May 21, 2017 at 04:41:00 EDT, Tushar Pathare wrote: Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpathare at sidra.org Sun May 21 10:18:11 2017 From: tpathare at sidra.org (Tushar Pathare) Date: Sun, 21 May 2017 09:18:11 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> Message-ID: <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: on behalf of "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" Reply-To: gpfsug main discussion list Date: Sunday, May 21, 2017 at 11:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that. Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log). -Aaron On May 21, 2017 at 04:41:00 EDT, Tushar Pathare wrote: Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpathare at sidra.org Sun May 21 10:19:23 2017 From: tpathare at sidra.org (Tushar Pathare) Date: Sun, 21 May 2017 09:19:23 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> Message-ID: <31425940-A928-455A-83DD-DE2F9F387AC6@sidra.org> Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 112.11.11.11 ( sidra.snode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 136 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: Tushar Pathare Date: Sunday, May 21, 2017 at 12:18 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: on behalf of "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" Reply-To: gpfsug main discussion list Date: Sunday, May 21, 2017 at 11:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that. Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log). -Aaron On May 21, 2017 at 04:41:00 EDT, Tushar Pathare wrote: Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Sun May 21 15:36:02 2017 From: oehmes at gmail.com (Sven Oehme) Date: Sun, 21 May 2017 14:36:02 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: <31425940-A928-455A-83DD-DE2F9F387AC6@sidra.org> References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> <31425940-A928-455A-83DD-DE2F9F387AC6@sidra.org> Message-ID: The reason is the default setting of : verbsRdmasPerConnection : 16 you can increase this , on smaller clusters i run on some with 1024, but its not advised to run this on 100's of nodes and not if you know exactly what you are doing. i would start by doubling it to 32 and see how much of the waiters disappear, then go to 64 if you still see too many. don't go beyond 128 unless somebody knowledgeable reviewed your config further going to 32 or 64 is very low risk if you already run with verbs send enabled and don't have issues. On Sun, May 21, 2017 at 2:19 AM Tushar Pathare wrote: > Hello Aaron, > > Yes we saw recently an issue with > > > > VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 > (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 > > And > > > > VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 112.11.11.11 ( > sidra.snode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 136 > > > > Thanks > > > > *Tushar B Pathare MBA IT,BE IT* > > Bigdata & GPFS > > Software Development & Databases > > Scientific Computing > > Bioinformatics Division > > Research > > > > "What ever the mind of man can conceive and believe, drill can query" > > > > *Sidra Medical and Research Centre* > > *Sidra OPC Building* > > Sidra Medical & Research Center > > PO Box 26999 > > Al Luqta Street > > Education City North Campus > > ?Qatar Foundation, Doha, Qatar > > Office 4003 3333 ext 37443 | M +974 74793547 <+974%207479%203547> > > tpathare at sidra.org | www.sidra.org > > > > > > *From: *Tushar Pathare > *Date: *Sunday, May 21, 2017 at 12:18 PM > > > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] VERBS RDMA issue > > > > Hello Aaron, > > Yes we saw recently an issue with > > > > VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 > (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 > > And > > > > > > > > > > *Tushar B Pathare MBA IT,BE IT* > > Bigdata & GPFS > > Software Development & Databases > > Scientific Computing > > Bioinformatics Division > > Research > > > > "What ever the mind of man can conceive and believe, drill can query" > > > > *Sidra Medical and Research Centre* > > *Sidra OPC Building* > > Sidra Medical & Research Center > > PO Box 26999 > > Al Luqta Street > > Education City North Campus > > ?Qatar Foundation, Doha, Qatar > > Office 4003 3333 ext 37443 | M +974 74793547 <+974%207479%203547> > > tpathare at sidra.org | www.sidra.org > > > > > > *From: * on behalf of "Knister, > Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > *Reply-To: *gpfsug main discussion list > *Date: *Sunday, May 21, 2017 at 11:59 AM > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] VERBS RDMA issue > > > > Hi Tushar, > > > > For me the issue was an underlying performance bottleneck (some CPU > frequency scaling problems causing cores to throttle back when it wasn't > appropriate). > > > > I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the > past to turn this off under certain conditions although I don't remember > what those where. Hopefully others can chime in and qualify that. > > > > > Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the > mmfs.log). > > > > > -Aaron > > > > > > On May 21, 2017 at 04:41:00 EDT, Tushar Pathare > wrote: > > Hello Team, > > > > We are facing a lot of messages waiters related to *waiting for conn > rdmas < conn maxrdmas > * > > > > Is there some recommended settings to resolve this issue.? > > Our config for RDMA is as follows for 140 nodes(32 cores each) > > > > > > VERBS RDMA Configuration: > > Status : started > > Start time : Thu > > Stats reset time : Thu > > Dump time : Sun > > mmfs verbsRdma : enable > > mmfs verbsRdmaCm : disable > > mmfs verbsPorts : mlx4_0/1 mlx4_0/2 > > mmfs verbsRdmasPerNode : 3200 > > mmfs verbsRdmasPerNode (max) : 3200 > > mmfs verbsRdmasPerNodeOptimize : yes > > mmfs verbsRdmasPerConnection : 16 > > mmfs verbsRdmasPerConnection (max) : 16 > > mmfs verbsRdmaMinBytes : 16384 > > mmfs verbsRdmaRoCEToS : -1 > > mmfs verbsRdmaQpRtrMinRnrTimer : 18 > > mmfs verbsRdmaQpRtrPathMtu : 2048 > > mmfs verbsRdmaQpRtrSl : 0 > > mmfs verbsRdmaQpRtrSlDynamic : no > > mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 > > mmfs verbsRdmaQpRtsRnrRetry : 6 > > mmfs verbsRdmaQpRtsRetryCnt : 6 > > mmfs verbsRdmaQpRtsTimeout : 18 > > mmfs verbsRdmaMaxSendBytes : 16777216 > > mmfs verbsRdmaMaxSendSge : 27 > > mmfs verbsRdmaSend : yes > > mmfs verbsRdmaSerializeRecv : no > > mmfs verbsRdmaSerializeSend : no > > mmfs verbsRdmaUseMultiCqThreads : yes > > mmfs verbsSendBufferMemoryMB : 1024 > > mmfs verbsLibName : libibverbs.so > > mmfs verbsRdmaCmLibName : librdmacm.so > > mmfs verbsRdmaMaxReconnectInterval : 60 > > mmfs verbsRdmaMaxReconnectRetries : -1 > > mmfs verbsRdmaReconnectAction : disable > > mmfs verbsRdmaReconnectThreads : 32 > > mmfs verbsHungRdmaTimeout : 90 > > ibv_fork_support : true > > Max connections : 196608 > > Max RDMA size : 16777216 > > Target number of vsend buffs : 16384 > > Initial vsend buffs per conn : 59 > > nQPs : 140 > > nCQs : 282 > > nCMIDs : 0 > > nDtoThreads : 2 > > nextIndex : 141 > > Number of Devices opened : 1 > > Device : mlx4_0 > > vendor_id : 713 > > Device vendor_part_id : 4099 > > Device mem register chunk : 8589934592 <(858)%20993-4592> > (0x200000000) > > Device max_sge : 32 > > Adjusted max_sge : 0 > > Adjusted max_sge vsend : 30 > > Device max_qp_wr : 16351 > > Device max_qp_rd_atom : 16 > > Open Connect Ports : 1 > > verbsConnectPorts[0] : mlx4_0/1/0 > > lid : 129 > > state : IBV_PORT_ACTIVE > > path_mtu : 2048 > > interface ID : 0xe41d2d030073b9d1 > > sendChannel.ib_channel : 0x7FA6CB816200 > > sendChannel.dtoThreadP : 0x7FA6CB821870 > > sendChannel.dtoThreadId : 12540 > > sendChannel.nFreeCq : 1 > > recvChannel.ib_channel : 0x7FA6CB81D590 > > recvChannel.dtoThreadP : 0x7FA6CB822BA0 > > recvChannel.dtoThreadId : 12541 > > recvChannel.nFreeCq : 1 > > ibv_cq : 0x7FA2724C81F8 > > ibv_cq.cqP : 0x0 > > ibv_cq.nEvents : 0 > > ibv_cq.contextP : 0x0 > > ibv_cq.ib_channel : 0x0 > > > > Thanks > > > > > > *Tushar B Pathare MBA IT,BE IT* > > Bigdata & GPFS > > Software Development & Databases > > Scientific Computing > > Bioinformatics Division > > Research > > > > "What ever the mind of man can conceive and believe, drill can query" > > > > *Sidra Medical and Research Centre* > > *Sidra OPC Building* > > Sidra Medical & Research Center > > PO Box 26999 > > Al Luqta Street > > Education City North Campus > > ?Qatar Foundation, Doha, Qatar > > Office 4003 3333 ext 37443 | M +974 74793547 <+974%207479%203547> > > tpathare at sidra.org | www.sidra.org > > > > Disclaimer: This email and its attachments may be confidential and are > intended solely for the use of the individual to whom it is addressed. If > you are not the intended recipient, any reading, printing, storage, > disclosure, copying or any other action taken in respect of this e-mail is > prohibited and may be unlawful. If you are not the intended recipient, > please notify the sender immediately by using the reply function and then > permanently delete what you have received. Any views or opinions expressed > are solely those of the author and do not necessarily represent those of > Sidra Medical and Research Center. > > Disclaimer: This email and its attachments may be confidential and are > intended solely for the use of the individual to whom it is addressed. If > you are not the intended recipient, any reading, printing, storage, > disclosure, copying or any other action taken in respect of this e-mail is > prohibited and may be unlawful. If you are not the intended recipient, > please notify the sender immediately by using the reply function and then > permanently delete what you have received. Any views or opinions expressed > are solely those of the author and do not necessarily represent those of > Sidra Medical and Research Center. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpathare at sidra.org Sun May 21 16:56:40 2017 From: tpathare at sidra.org (Tushar Pathare) Date: Sun, 21 May 2017 15:56:40 +0000 Subject: [gpfsug-discuss] VERBS RDMA issue In-Reply-To: References: <76691794-CAE1-4CBF-AC17-E170B5A7FB37@sidra.org> <77E9AE64-13D3-4572-A85E-52728066C45F@sidra.org> <31425940-A928-455A-83DD-DE2F9F387AC6@sidra.org> Message-ID: Thanks Sven. Will read more about it and discuss with the team to come to a conclusion Thank you for pointing out the param. Will let you know the results after the tuning. Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: on behalf of Sven Oehme Reply-To: gpfsug main discussion list Date: Sunday, May 21, 2017 at 5:36 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] VERBS RDMA issue The reason is the default setting of : verbsRdmasPerConnection : 16 you can increase this , on smaller clusters i run on some with 1024, but its not advised to run this on 100's of nodes and not if you know exactly what you are doing. i would start by doubling it to 32 and see how much of the waiters disappear, then go to 64 if you still see too many. don't go beyond 128 unless somebody knowledgeable reviewed your config further going to 32 or 64 is very low risk if you already run with verbs send enabled and don't have issues. On Sun, May 21, 2017 at 2:19 AM Tushar Pathare > wrote: Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 112.11.11.11 ( sidra.snode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 136 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: Tushar Pathare > Date: Sunday, May 21, 2017 at 12:18 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org From: > on behalf of "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > Reply-To: gpfsug main discussion list > Date: Sunday, May 21, 2017 at 11:59 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] VERBS RDMA issue Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that. Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log). -Aaron On May 21, 2017 at 04:41:00 EDT, Tushar Pathare > wrote: Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some recommended settings to resolve this issue.? Our config for RDMA is as follows for 140 nodes(32 cores each) VERBS RDMA Configuration: Status : started Start time : Thu Stats reset time : Thu Dump time : Sun mmfs verbsRdma : enable mmfs verbsRdmaCm : disable mmfs verbsPorts : mlx4_0/1 mlx4_0/2 mmfs verbsRdmasPerNode : 3200 mmfs verbsRdmasPerNode (max) : 3200 mmfs verbsRdmasPerNodeOptimize : yes mmfs verbsRdmasPerConnection : 16 mmfs verbsRdmasPerConnection (max) : 16 mmfs verbsRdmaMinBytes : 16384 mmfs verbsRdmaRoCEToS : -1 mmfs verbsRdmaQpRtrMinRnrTimer : 18 mmfs verbsRdmaQpRtrPathMtu : 2048 mmfs verbsRdmaQpRtrSl : 0 mmfs verbsRdmaQpRtrSlDynamic : no mmfs verbsRdmaQpRtrSlDynamicTimeout : 10 mmfs verbsRdmaQpRtsRnrRetry : 6 mmfs verbsRdmaQpRtsRetryCnt : 6 mmfs verbsRdmaQpRtsTimeout : 18 mmfs verbsRdmaMaxSendBytes : 16777216 mmfs verbsRdmaMaxSendSge : 27 mmfs verbsRdmaSend : yes mmfs verbsRdmaSerializeRecv : no mmfs verbsRdmaSerializeSend : no mmfs verbsRdmaUseMultiCqThreads : yes mmfs verbsSendBufferMemoryMB : 1024 mmfs verbsLibName : libibverbs.so mmfs verbsRdmaCmLibName : librdmacm.so mmfs verbsRdmaMaxReconnectInterval : 60 mmfs verbsRdmaMaxReconnectRetries : -1 mmfs verbsRdmaReconnectAction : disable mmfs verbsRdmaReconnectThreads : 32 mmfs verbsHungRdmaTimeout : 90 ibv_fork_support : true Max connections : 196608 Max RDMA size : 16777216 Target number of vsend buffs : 16384 Initial vsend buffs per conn : 59 nQPs : 140 nCQs : 282 nCMIDs : 0 nDtoThreads : 2 nextIndex : 141 Number of Devices opened : 1 Device : mlx4_0 vendor_id : 713 Device vendor_part_id : 4099 Device mem register chunk : 8589934592 (0x200000000) Device max_sge : 32 Adjusted max_sge : 0 Adjusted max_sge vsend : 30 Device max_qp_wr : 16351 Device max_qp_rd_atom : 16 Open Connect Ports : 1 verbsConnectPorts[0] : mlx4_0/1/0 lid : 129 state : IBV_PORT_ACTIVE path_mtu : 2048 interface ID : 0xe41d2d030073b9d1 sendChannel.ib_channel : 0x7FA6CB816200 sendChannel.dtoThreadP : 0x7FA6CB821870 sendChannel.dtoThreadId : 12540 sendChannel.nFreeCq : 1 recvChannel.ib_channel : 0x7FA6CB81D590 recvChannel.dtoThreadP : 0x7FA6CB822BA0 recvChannel.dtoThreadId : 12541 recvChannel.nFreeCq : 1 ibv_cq : 0x7FA2724C81F8 ibv_cq.cqP : 0x0 ibv_cq.nEvents : 0 ibv_cq.contextP : 0x0 ibv_cq.ib_channel : 0x0 Thanks Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division Research "What ever the mind of man can conceive and believe, drill can query" Sidra Medical and Research Centre Sidra OPC Building Sidra Medical & Research Center PO Box 26999 Al Luqta Street Education City North Campus ?Qatar Foundation, Doha, Qatar Office 4003 3333 ext 37443 | M +974 74793547 tpathare at sidra.org | www.sidra.org Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed May 24 10:43:37 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Wed, 24 May 2017 09:43:37 +0000 Subject: [gpfsug-discuss] Report on Scale and Cloud Message-ID: Hi All, I forgot that I never circulated, as part of the RCUK Working Group on Cloud, we produced a report on using Scale with Cloud/Undercloud ... You can download the report from: https://cloud.ac.uk/reports/spectrumscale/ We had some input from various IBM people whilst writing, and bear in mind that its a snapshot of support at the point in time when it was written. Simon From kkr at lbl.gov Wed May 24 20:57:49 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 24 May 2017 12:57:49 -0700 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working Message-ID: Hello, We have been experimenting with Zimon and the SS GUI on our dev cluster under 4.2.3. Things work well with one collector, but I'm running into issues when trying to use symmetric collector peers, i.e. federation. hostA and hostB are setup as both collectors and sensors with each a collector peer for the other. When this is done I can use mmperfmon to query hostA from hostA or hostB and vice versa. However, with this federation setup, the GUI fails to show data. The GUI is running on hostB. >From the collector candidate pool, hostA has been selected (automatically, not manually) as can be seen in the sensor configuration file. The GUI is unable to load data (just shows "Loading" on the graph), *unless* I change the setting of the ZIMonAddress variable in /usr/lpp/mmfs/gui/conf/gpfsgui.properties from localhost to hostA explicitly, it does not work if I change it to hostB explicitly. The GUI also works fine if I remove the peer entries altogether and just have one collector. I thought that federation meant that no matter which collector was queried the data would be returned. This appears to work for mmperfmon, but not the GUI. Can anyone advise? I also don't like the idea of having a pool of collector candidates and hard-coding one into the GUI configuration. I am including some output below to show the configs and query results. Thanks, Kristy The peers are added into the ZIMonCollector.cfg using the default port 9085: peers = { host = "hostA" port = "9085" }, { host = "hostB" port = "9085" } And the nodes are added as collector candidates, on hostA and hostB you see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors. cfg: colCandidates = "hostA.nersc.gov ", " hostB.nersc.gov " colRedundancy = 1 collectors = { host = "hostA.nersc.gov " port = "4739" } Showing the config with mmperfmon config show: colCandidates = "hostA.nersc.gov ", " hostB.nersc.gov " colRedundancy = 1 collectors = { host = "" Using mmperfmon I can query either host. [root at hostA ~]# mmperfmon query cpu -N hostB Legend: 1: hostB.nersc.gov |CPU|cpu_system 2: hostB.nersc.gov |CPU|cpu_user 3: hostB.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:03:54 0.54 3.67 4961 2 2017-05-23-17:03:55 0.63 3.55 6199 3 2017-05-23-17:03:56 1.59 3.76 7914 4 2017-05-23-17:03:57 1.38 5.34 5393 5 2017-05-23-17:03:58 0.54 2.21 2435 6 2017-05-23-17:03:59 0.13 0.29 2519 7 2017-05-23-17:04:00 0.13 0.25 2197 8 2017-05-23-17:04:01 0.13 0.29 2473 9 2017-05-23-17:04:02 0.08 0.21 2336 10 2017-05-23-17:04:03 0.13 0.21 2312 [root@ hostB ~]# mmperfmon query cpu -N hostB Legend: 1: hostB.nersc.gov |CPU|cpu_system 2: hostB.nersc.gov |CPU|cpu_user 3: hostB.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:04:07 0.13 0.21 2010 2 2017-05-23-17:04:08 0.04 0.21 2571 3 2017-05-23-17:04:09 0.08 0.25 2766 4 2017-05-23-17:04:10 0.13 0.29 3147 5 2017-05-23-17:04:11 0.83 0.83 2596 6 2017-05-23-17:04:12 0.33 0.54 2530 7 2017-05-23-17:04:13 0.08 0.33 2428 8 2017-05-23-17:04:14 0.13 0.25 2326 9 2017-05-23-17:04:15 0.13 0.29 4190 10 2017-05-23-17:04:16 0.58 1.92 5882 [root@ hostB ~]# mmperfmon query cpu -N hostA Legend: 1: hostA.nersc.gov |CPU|cpu_system 2: hostA.nersc.gov |CPU|cpu_user 3: hostA.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:05:45 0.33 0.46 7460 2 2017-05-23-17:05:46 0.33 0.42 8993 3 2017-05-23-17:05:47 0.42 0.54 8709 4 2017-05-23-17:05:48 0.38 0.5 5923 5 2017-05-23-17:05:49 0.54 1.46 7381 6 2017-05-23-17:05:50 0.58 3.51 10381 7 2017-05-23-17:05:51 1.05 1.13 10995 8 2017-05-23-17:05:52 0.88 0.92 10855 9 2017-05-23-17:05:53 0.5 0.63 10958 10 2017-05-23-17:05:54 0.5 0.59 10285 [root@ hostA ~]# mmperfmon query cpu -N hostA Legend: 1: hostA.nersc.gov |CPU|cpu_system 2: hostA.nersc.gov |CPU|cpu_user 3: hostA.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:05:50 0.58 3.51 10381 2 2017-05-23-17:05:51 1.05 1.13 10995 3 2017-05-23-17:05:52 0.88 0.92 10855 4 2017-05-23-17:05:53 0.5 0.63 10958 5 2017-05-23-17:05:54 0.5 0.59 10285 6 2017-05-23-17:05:55 0.46 0.63 11621 7 2017-05-23-17:05:56 0.84 0.92 11477 8 2017-05-23-17:05:57 1.47 1.88 11084 9 2017-05-23-17:05:58 0.46 1.76 9125 10 2017-05-23-17:05:59 0.42 0.63 11745 -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylorm at us.ibm.com Thu May 25 14:46:06 2017 From: taylorm at us.ibm.com (Michael L Taylor) Date: Thu, 25 May 2017 06:46:06 -0700 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working In-Reply-To: References: Message-ID: Hi Kristy, At first glance your config looks ok. Here are a few things to check. Is 4.2.3 the first time you have installed and configured performance monitoring? Or have you configured it at some version < 4.2.3 and then upgraded to 4.2.3? Did you restart pmcollector after changing the configuration? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_guienableperfmon.htm "Configure peer configuration for the collectors. The collector configuration is stored in the /opt/IBM/zimon/ZIMonCollector.cfg file. This file defines collector peer configuration and the aggregation rules. If you are using only a single collector, you can skip this step. Restart the pmcollector service after making changes to the configuration file. The GUI must have access to all data from each GUI node. " Firewall ports are open for performance monitoring and MGMT GUI? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_firewallforgui.htm?cp=STXKQY https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_firewallforPMT.htm Did you setup the collectors with : prompt# mmperfmon config generate --collectors collector1.domain.com,collector2.domain.com,? Once the configuration file has been stored within IBM Spectrum Scale, it can be activated as follows. prompt# mmchnode --perfmon ?N nodeclass1,nodeclass2,? Perhaps once you make sure the federated mode is set between hostA and hostB as you like then 'systemctl restart pmcollector' and then 'systemctl restart gpfsgui' on both nodes? From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 05/24/2017 12:58 PM Subject: gpfsug-discuss Digest, Vol 64, Issue 61 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. SS Metrics (Zimon) and SS GUI, Federation not working (Kristy Kallback-Rose) ---------------------------------------------------------------------- Message: 1 Date: Wed, 24 May 2017 12:57:49 -0700 From: Kristy Kallback-Rose To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working Message-ID: Content-Type: text/plain; charset="utf-8" Hello, We have been experimenting with Zimon and the SS GUI on our dev cluster under 4.2.3. Things work well with one collector, but I'm running into issues when trying to use symmetric collector peers, i.e. federation. hostA and hostB are setup as both collectors and sensors with each a collector peer for the other. When this is done I can use mmperfmon to query hostA from hostA or hostB and vice versa. However, with this federation setup, the GUI fails to show data. The GUI is running on hostB. >From the collector candidate pool, hostA has been selected (automatically, not manually) as can be seen in the sensor configuration file. The GUI is unable to load data (just shows "Loading" on the graph), *unless* I change the setting of the ZIMonAddress variable in /usr/lpp/mmfs/gui/conf/gpfsgui.properties from localhost to hostA explicitly, it does not work if I change it to hostB explicitly. The GUI also works fine if I remove the peer entries altogether and just have one collector. I thought that federation meant that no matter which collector was queried the data would be returned. This appears to work for mmperfmon, but not the GUI. Can anyone advise? I also don't like the idea of having a pool of collector candidates and hard-coding one into the GUI configuration. I am including some output below to show the configs and query results. Thanks, Kristy The peers are added into the ZIMonCollector.cfg using the default port 9085: peers = { host = "hostA" port = "9085" }, { host = "hostB" port = "9085" } And the nodes are added as collector candidates, on hostA and hostB you see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors. cfg: colCandidates = "hostA.nersc.gov ", " hostB.nersc.gov " colRedundancy = 1 collectors = { host = "hostA.nersc.gov " port = "4739" } Showing the config with mmperfmon config show: colCandidates = "hostA.nersc.gov ", " hostB.nersc.gov " colRedundancy = 1 collectors = { host = "" Using mmperfmon I can query either host. [root at hostA ~]# mmperfmon query cpu -N hostB Legend: 1: hostB.nersc.gov |CPU|cpu_system 2: hostB.nersc.gov |CPU|cpu_user 3: hostB.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:03:54 0.54 3.67 4961 2 2017-05-23-17:03:55 0.63 3.55 6199 3 2017-05-23-17:03:56 1.59 3.76 7914 4 2017-05-23-17:03:57 1.38 5.34 5393 5 2017-05-23-17:03:58 0.54 2.21 2435 6 2017-05-23-17:03:59 0.13 0.29 2519 7 2017-05-23-17:04:00 0.13 0.25 2197 8 2017-05-23-17:04:01 0.13 0.29 2473 9 2017-05-23-17:04:02 0.08 0.21 2336 10 2017-05-23-17:04:03 0.13 0.21 2312 [root@ hostB ~]# mmperfmon query cpu -N hostB Legend: 1: hostB.nersc.gov |CPU|cpu_system 2: hostB.nersc.gov |CPU|cpu_user 3: hostB.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:04:07 0.13 0.21 2010 2 2017-05-23-17:04:08 0.04 0.21 2571 3 2017-05-23-17:04:09 0.08 0.25 2766 4 2017-05-23-17:04:10 0.13 0.29 3147 5 2017-05-23-17:04:11 0.83 0.83 2596 6 2017-05-23-17:04:12 0.33 0.54 2530 7 2017-05-23-17:04:13 0.08 0.33 2428 8 2017-05-23-17:04:14 0.13 0.25 2326 9 2017-05-23-17:04:15 0.13 0.29 4190 10 2017-05-23-17:04:16 0.58 1.92 5882 [root@ hostB ~]# mmperfmon query cpu -N hostA Legend: 1: hostA.nersc.gov |CPU|cpu_system 2: hostA.nersc.gov |CPU|cpu_user 3: hostA.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:05:45 0.33 0.46 7460 2 2017-05-23-17:05:46 0.33 0.42 8993 3 2017-05-23-17:05:47 0.42 0.54 8709 4 2017-05-23-17:05:48 0.38 0.5 5923 5 2017-05-23-17:05:49 0.54 1.46 7381 6 2017-05-23-17:05:50 0.58 3.51 10381 7 2017-05-23-17:05:51 1.05 1.13 10995 8 2017-05-23-17:05:52 0.88 0.92 10855 9 2017-05-23-17:05:53 0.5 0.63 10958 10 2017-05-23-17:05:54 0.5 0.59 10285 [root@ hostA ~]# mmperfmon query cpu -N hostA Legend: 1: hostA.nersc.gov |CPU|cpu_system 2: hostA.nersc.gov |CPU|cpu_user 3: hostA.nersc.gov |CPU|cpu_contexts Row Timestamp cpu_system cpu_user cpu_contexts 1 2017-05-23-17:05:50 0.58 3.51 10381 2 2017-05-23-17:05:51 1.05 1.13 10995 3 2017-05-23-17:05:52 0.88 0.92 10855 4 2017-05-23-17:05:53 0.5 0.63 10958 5 2017-05-23-17:05:54 0.5 0.59 10285 6 2017-05-23-17:05:55 0.46 0.63 11621 7 2017-05-23-17:05:56 0.84 0.92 11477 8 2017-05-23-17:05:57 1.47 1.88 11084 9 2017-05-23-17:05:58 0.46 1.76 9125 10 2017-05-23-17:05:59 0.42 0.63 11745 -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20170524/e64509b9/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 64, Issue 61 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From NSCHULD at de.ibm.com Thu May 25 15:13:16 2017 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Thu, 25 May 2017 16:13:16 +0200 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working In-Reply-To: References: Message-ID: Hi, please upgrade to 4.2.3 ptf1 - the version before has an issue with federated queries in some situations. Mit freundlichen Gr??en / Kind regards Norbert Schuld From: Kristy Kallback-Rose To: gpfsug-discuss at spectrumscale.org Date: 24/05/2017 21:58 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, ? We have been experimenting with Zimon and the SS GUI on our dev cluster under 4.2.3. Things work well with one collector, but I'm running into issues when trying to use symmetric collector peers, i.e. federation. ? hostA and hostB are setup as both collectors and sensors with each a collector peer for the other. When this is done I can use mmperfmon to query hostA from hostA or hostB and vice versa. However, with this federation setup, the GUI fails to show data. The GUI is running on hostB. >From the collector candidate pool, hostA has been selected (automatically, not manually) as can be seen in the sensor configuration file. The GUI is unable to load data (just shows "Loading" on the graph), *unless* I change the setting of the?ZIMonAddress variable in?/usr/lpp/mmfs/gui/conf/gpfsgui.properties from localhost to hostA explicitly, it does not work if I change it to hostB explicitly. The GUI also works fine if I remove the peer entries altogether and just have one collector. ? I thought that federation meant that no matter which collector was queried the data would be returned. This appears to work for mmperfmon, but not the GUI. Can anyone advise? I also don't like the idea of having a pool of collector candidates and hard-coding one into the GUI configuration. I am including some output below to show the configs and query results. Thanks, Kristy ? The peers are added into the?ZIMonCollector.cfg using the default port 9085: ?peers = { ? ? ? ? host = "hostA" ? ? ? ? port = "9085" ?}, ?{ ? ? ? ? host = "hostB" ? ? ? ? port = "9085" ?} And the nodes are added as collector candidates, on hostA and hostB you see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors.cfg: colCandidates = "hostA.nersc.gov", "hostB.nersc.gov" colRedundancy = 1 collectors = { host = "hostA.nersc.gov" port = "4739" } Showing the config with mmperfmon config show: colCandidates = "hostA.nersc.gov", "hostB.nersc.gov" colRedundancy = 1 collectors = { host = "" Using mmperfmon I can query either host. [root at hostA ~]#? mmperfmon query cpu -N hostB Legend: ?1: hostB.nersc.gov|CPU|cpu_system ?2:?hostB.nersc.gov|CPU|cpu_user ?3:?hostB.nersc.gov|CPU|cpu_contexts Row ? ? ? ? ? Timestamp cpu_system cpu_user cpu_contexts ? 1 2017-05-23-17:03:54 ? ? ? 0.54 ? ? 3.67 ? ? ? ? 4961 ? 2 2017-05-23-17:03:55 ? ? ? 0.63 ? ? 3.55 ? ? ? ? 6199 ? 3 2017-05-23-17:03:56 ? ? ? 1.59 ? ? 3.76 ? ? ? ? 7914 ? 4 2017-05-23-17:03:57 ? ? ? 1.38 ? ? 5.34 ? ? ? ? 5393 ? 5 2017-05-23-17:03:58 ? ? ? 0.54 ? ? 2.21 ? ? ? ? 2435 ? 6 2017-05-23-17:03:59 ? ? ? 0.13 ? ? 0.29 ? ? ? ? 2519 ? 7 2017-05-23-17:04:00 ? ? ? 0.13 ? ? 0.25 ? ? ? ? 2197 ? 8 2017-05-23-17:04:01 ? ? ? 0.13 ? ? 0.29 ? ? ? ? 2473 ? 9 2017-05-23-17:04:02 ? ? ? 0.08 ? ? 0.21 ? ? ? ? 2336 ?10 2017-05-23-17:04:03 ? ? ? 0.13 ? ? 0.21 ? ? ? ? 2312 [root@?hostB?~]#? mmperfmon query cpu -N?hostB Legend: ?1:?hostB.nersc.gov|CPU|cpu_system ?2:?hostB.nersc.gov|CPU|cpu_user ?3:?hostB.nersc.gov|CPU|cpu_contexts Row ? ? ? ? ? Timestamp cpu_system cpu_user cpu_contexts ? 1 2017-05-23-17:04:07 ? ? ? 0.13 ? ? 0.21 ? ? ? ? 2010 ? 2 2017-05-23-17:04:08 ? ? ? 0.04 ? ? 0.21 ? ? ? ? 2571 ? 3 2017-05-23-17:04:09 ? ? ? 0.08 ? ? 0.25 ? ? ? ? 2766 ? 4 2017-05-23-17:04:10 ? ? ? 0.13 ? ? 0.29 ? ? ? ? 3147 ? 5 2017-05-23-17:04:11 ? ? ? 0.83 ? ? 0.83 ? ? ? ? 2596 ? 6 2017-05-23-17:04:12 ? ? ? 0.33 ? ? 0.54 ? ? ? ? 2530 ? 7 2017-05-23-17:04:13 ? ? ? 0.08 ? ? 0.33 ? ? ? ? 2428 ? 8 2017-05-23-17:04:14 ? ? ? 0.13 ? ? 0.25 ? ? ? ? 2326 ? 9 2017-05-23-17:04:15 ? ? ? 0.13 ? ? 0.29 ? ? ? ? 4190 ?10 2017-05-23-17:04:16 ? ? ? 0.58 ? ? 1.92 ? ? ? ? 5882 [root@?hostB?~]#? mmperfmon query cpu -N?hostA Legend: ?1:?hostA.nersc.gov|CPU|cpu_system ?2:?hostA.nersc.gov|CPU|cpu_user ?3:?hostA.nersc.gov|CPU|cpu_contexts Row ? ? ? ? ? Timestamp cpu_system cpu_user cpu_contexts ? 1 2017-05-23-17:05:45 ? ? ? 0.33 ? ? 0.46 ? ? ? ? 7460 ? 2 2017-05-23-17:05:46 ? ? ? 0.33 ? ? 0.42 ? ? ? ? 8993 ? 3 2017-05-23-17:05:47 ? ? ? 0.42 ? ? 0.54 ? ? ? ? 8709 ? 4 2017-05-23-17:05:48 ? ? ? 0.38? ? ? 0.5 ? ? ? ? 5923 ? 5 2017-05-23-17:05:49 ? ? ? 0.54 ? ? 1.46 ? ? ? ? 7381 ? 6 2017-05-23-17:05:50 ? ? ? 0.58 ? ? 3.51? ? ? ? 10381 ? 7 2017-05-23-17:05:51 ? ? ? 1.05 ? ? 1.13? ? ? ? 10995 ? 8 2017-05-23-17:05:52 ? ? ? 0.88 ? ? 0.92? ? ? ? 10855 ? 9 2017-05-23-17:05:53? ? ? ? 0.5 ? ? 0.63? ? ? ? 10958 ?10 2017-05-23-17:05:54? ? ? ? 0.5 ? ? 0.59? ? ? ? 10285 [root@?hostA?~]#? mmperfmon query cpu -N?hostA Legend: ?1:?hostA.nersc.gov|CPU|cpu_system ?2:?hostA.nersc.gov|CPU|cpu_user ?3:?hostA.nersc.gov|CPU|cpu_contexts Row ? ? ? ? ? Timestamp cpu_system cpu_user cpu_contexts ? 1 2017-05-23-17:05:50 ? ? ? 0.58 ? ? 3.51? ? ? ? 10381 ? 2 2017-05-23-17:05:51 ? ? ? 1.05 ? ? 1.13? ? ? ? 10995 ? 3 2017-05-23-17:05:52 ? ? ? 0.88 ? ? 0.92? ? ? ? 10855 ? 4 2017-05-23-17:05:53? ? ? ? 0.5 ? ? 0.63? ? ? ? 10958 ? 5 2017-05-23-17:05:54? ? ? ? 0.5 ? ? 0.59? ? ? ? 10285 ? 6 2017-05-23-17:05:55 ? ? ? 0.46 ? ? 0.63? ? ? ? 11621 ? 7 2017-05-23-17:05:56 ? ? ? 0.84 ? ? 0.92? ? ? ? 11477 ? 8 2017-05-23-17:05:57 ? ? ? 1.47 ? ? 1.88? ? ? ? 11084 ? 9 2017-05-23-17:05:58 ? ? ? 0.46 ? ? 1.76 ? ? ? ? 9125 ?10 2017-05-23-17:05:59 ? ? ? 0.42 ? ? 0.63? ? ? ? 11745 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kkr at lbl.gov Thu May 25 22:51:32 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 25 May 2017 14:51:32 -0700 Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation not working In-Reply-To: References: Message-ID: Hi Michael, Norbert, Thanks for your replies, we did do all the setup as Michael described, and stop and restart services more than once ;-). I believe the issue is resolved with the PTF. I am still checking, but it seems to be working with symmetric peering between those two nodes. I will test further and expand to other nodes and make sure it continue to work. I will report back if I run into any other issues. Cheers, Kristy On Thu, May 25, 2017 at 6:46 AM, Michael L Taylor wrote: > Hi Kristy, > At first glance your config looks ok. Here are a few things to check. > > Is 4.2.3 the first time you have installed and configured performance > monitoring? Or have you configured it at some version < 4.2.3 and then > upgraded to 4.2.3? > > > Did you restart pmcollector after changing the configuration? > > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/ > com.ibm.spectrum.scale.v4r23.doc/bl1adv_guienableperfmon.htm > "Configure peer configuration for the collectors. The collector > configuration is stored in the /opt/IBM/zimon/ZIMonCollector.cfg file. > This file defines collector peer configuration and the aggregation rules. > If you are using only a single collector, you can skip this step. Restart > the pmcollector service after making changes to the configuration file. The > GUI must have access to all data from each GUI node. " > > Firewall ports are open for performance monitoring and MGMT GUI? > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/ > com.ibm.spectrum.scale.v4r23.doc/bl1adv_firewallforgui.htm?cp=STXKQY > https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/ > com.ibm.spectrum.scale.v4r23.doc/bl1adv_firewallforPMT.htm > > Did you setup the collectors with : > prompt# mmperfmon config generate --collectors collector1.domain.com, > collector2.domain.com,? > > Once the configuration file has been stored within IBM Spectrum Scale, it > can be activated as follows. > prompt# mmchnode --perfmon ?N nodeclass1,nodeclass2,? > > Perhaps once you make sure the federated mode is set between hostA and > hostB as you like then 'systemctl restart pmcollector' and then 'systemctl > restart gpfsgui' on both nodes? > > > > [image: Inactive hide details for gpfsug-discuss-request---05/24/2017 > 12:58:21 PM---Send gpfsug-discuss mailing list submissions to gp] > gpfsug-discuss-request---05/24/2017 12:58:21 PM---Send gpfsug-discuss > mailing list submissions to gpfsug-discuss at spectrumscale.org > > From: gpfsug-discuss-request at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Date: 05/24/2017 12:58 PM > Subject: gpfsug-discuss Digest, Vol 64, Issue 61 > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. SS Metrics (Zimon) and SS GUI, Federation not working > (Kristy Kallback-Rose) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 24 May 2017 12:57:49 -0700 > From: Kristy Kallback-Rose > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] SS Metrics (Zimon) and SS GUI, Federation > not working > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > > Hello, > > We have been experimenting with Zimon and the SS GUI on our dev cluster > under 4.2.3. Things work well with one collector, but I'm running into > issues when trying to use symmetric collector peers, i.e. federation. > > hostA and hostB are setup as both collectors and sensors with each a > collector peer for the other. When this is done I can use mmperfmon to > query hostA from hostA or hostB and vice versa. However, with this > federation setup, the GUI fails to show data. The GUI is running on hostB. > >From the collector candidate pool, hostA has been selected (automatically, > not manually) as can be seen in the sensor configuration file. The GUI is > unable to load data (just shows "Loading" on the graph), *unless* I change > the setting of the ZIMonAddress variable in > /usr/lpp/mmfs/gui/conf/gpfsgui.properties > from localhost to hostA explicitly, it does not work if I change it to > hostB explicitly. The GUI also works fine if I remove the peer entries > altogether and just have one collector. > > I thought that federation meant that no matter which collector was > queried the data would be returned. This appears to work for mmperfmon, but > not the GUI. Can anyone advise? I also don't like the idea of having a pool > of collector candidates and hard-coding one into the GUI configuration. I > am including some output below to show the configs and query results. > > Thanks, > > Kristy > > > The peers are added into the ZIMonCollector.cfg using the default port > 9085: > > peers = { > > host = "hostA" > > port = "9085" > > }, > > { > > host = "hostB" > > port = "9085" > > } > > > And the nodes are added as collector candidates, on hostA and hostB you > see, looking at the config file directly, in /opt/IBM/zimon/ZIMonSensors. > cfg: > > colCandidates = "hostA.nersc.gov ", " > hostB.nersc.gov " > > colRedundancy = 1 > > collectors = { > > host = "hostA.nersc.gov " > > port = "4739" > > } > > > Showing the config with mmperfmon config show: > > colCandidates = "hostA.nersc.gov ", " > hostB.nersc.gov " > > colRedundancy = 1 > > collectors = { > > host = "" > > > Using mmperfmon I can query either host. > > > [root at hostA ~]# mmperfmon query cpu -N hostB > > > Legend: > > 1: hostB.nersc.gov |CPU|cpu_system > > 2: hostB.nersc.gov |CPU|cpu_user > > 3: hostB.nersc.gov |CPU|cpu_contexts > > > > Row Timestamp cpu_system cpu_user cpu_contexts > > 1 2017-05-23-17:03:54 0.54 3.67 4961 > > 2 2017-05-23-17:03:55 0.63 3.55 6199 > > 3 2017-05-23-17:03:56 1.59 3.76 7914 > > 4 2017-05-23-17:03:57 1.38 5.34 5393 > > 5 2017-05-23-17:03:58 0.54 2.21 2435 > > 6 2017-05-23-17:03:59 0.13 0.29 2519 > > 7 2017-05-23-17:04:00 0.13 0.25 2197 > > 8 2017-05-23-17:04:01 0.13 0.29 2473 > > 9 2017-05-23-17:04:02 0.08 0.21 2336 > > 10 2017-05-23-17:04:03 0.13 0.21 2312 > > > [root@ hostB ~]# mmperfmon query cpu -N hostB > > > Legend: > > 1: hostB.nersc.gov |CPU|cpu_system > > 2: hostB.nersc.gov |CPU|cpu_user > > 3: hostB.nersc.gov |CPU|cpu_contexts > > > > Row Timestamp cpu_system cpu_user cpu_contexts > > 1 2017-05-23-17:04:07 0.13 0.21 2010 > > 2 2017-05-23-17:04:08 0.04 0.21 2571 > > 3 2017-05-23-17:04:09 0.08 0.25 2766 > > 4 2017-05-23-17:04:10 0.13 0.29 3147 > > 5 2017-05-23-17:04:11 0.83 0.83 2596 > > 6 2017-05-23-17:04:12 0.33 0.54 2530 > > 7 2017-05-23-17:04:13 0.08 0.33 2428 > > 8 2017-05-23-17:04:14 0.13 0.25 2326 > > 9 2017-05-23-17:04:15 0.13 0.29 4190 > > 10 2017-05-23-17:04:16 0.58 1.92 5882 > > > [root@ hostB ~]# mmperfmon query cpu -N hostA > > > Legend: > > 1: hostA.nersc.gov |CPU|cpu_system > > 2: hostA.nersc.gov |CPU|cpu_user > > 3: hostA.nersc.gov |CPU|cpu_contexts > > > > Row Timestamp cpu_system cpu_user cpu_contexts > > 1 2017-05-23-17:05:45 0.33 0.46 7460 > > 2 2017-05-23-17:05:46 0.33 0.42 8993 > > 3 2017-05-23-17:05:47 0.42 0.54 8709 > > 4 2017-05-23-17:05:48 0.38 0.5 5923 > > 5 2017-05-23-17:05:49 0.54 1.46 7381 > > 6 2017-05-23-17:05:50 0.58 3.51 10381 > > 7 2017-05-23-17:05:51 1.05 1.13 10995 > > 8 2017-05-23-17:05:52 0.88 0.92 10855 > > 9 2017-05-23-17:05:53 0.5 0.63 10958 > > 10 2017-05-23-17:05:54 0.5 0.59 10285 > > > [root@ hostA ~]# mmperfmon query cpu -N hostA > > > Legend: > > 1: hostA.nersc.gov |CPU|cpu_system > > 2: hostA.nersc.gov |CPU|cpu_user > > 3: hostA.nersc.gov |CPU|cpu_contexts > > > > Row Timestamp cpu_system cpu_user cpu_contexts > > 1 2017-05-23-17:05:50 0.58 3.51 10381 > > 2 2017-05-23-17:05:51 1.05 1.13 10995 > > 3 2017-05-23-17:05:52 0.88 0.92 10855 > > 4 2017-05-23-17:05:53 0.5 0.63 10958 > > 5 2017-05-23-17:05:54 0.5 0.59 10285 > > 6 2017-05-23-17:05:55 0.46 0.63 11621 > > 7 2017-05-23-17:05:56 0.84 0.92 11477 > > 8 2017-05-23-17:05:57 1.47 1.88 11084 > > 9 2017-05-23-17:05:58 0.46 1.76 9125 > > 10 2017-05-23-17:05:59 0.42 0.63 11745 > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: 20170524/e64509b9/attachment.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 64, Issue 61 > ********************************************** > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Mon May 29 21:01:38 2017 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 29 May 2017 16:01:38 -0400 Subject: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca><20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca><20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Message-ID: <20170529160138.18847jpj5x9kz8ki@support.scinet.utoronto.ca> Quoting "Marc A Kaplan" : > Easier than hacking mmbackup or writing/editing policy rules, > > mmbackup interprets > your TSM INCLUDE/EXCLUDE configuration statements -- so that is a > supported and recommended way of doing business... Finally got some time to resume testing on this Here is the syntax used (In this test I want to backup /wosgpfs/backmeup only) mmbackup /wosgpfs -N wos-gateway02-ib0 -s /dev/shm --tsm-errorlog $logfile -L 4 As far as I can tell, the EXCLUDE statements in the TSM configuration (dsm.opt) are being *ignored*. I tried a couple of formats: 1) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup" INCLExcl "/sysadmin/BA/ba-wos/bin/inclexcl" 1a) cat /sysadmin/BA/ba-wos/bin/inclexcl exclude.dir /wosgpfs/ignore /wosgpfs/junk /wosgpfs/project 1b) cat /sysadmin/BA/ba-wos/bin/inclexcl exclude.dir /wosgpfs/ignore exclude.dir /wosgpfs/junk exclude.dir /wosgpfs/project 2) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup" exclude.dir /wosgpfs/ignore exclude.dir /wosgpfs/junk exclude.dir /wosgpfs/project 3) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup -/wosgpfs/ignore -/wosgpfs/junk -/wosgpfs/project" In another words, all the contents under /wosgpfs are being traversed and going to the TSM backup. Furthermore, even with "-L 4" mmbackup is not logging the list of files being sent to the TSM backup anywhere on the client side. I only get that information from the TSM server side (get filespace). I know that all contents of /wosgpfs are being traversed because I have a tail on /wosgpfs/.mmbackupShadow.1.TAPENODE3.filesys.update > > If that doesn't do it for your purposes... You're into some light > hacking... So look inside the mmbackup and tsbackup33 scripts and you'll > find some DEBUG variables that should allow for keeping work and temp > files around ... including the generated policy rules. > I'm calling this hacking "light", because I don't think you'll need to > change the scripts, but just look around and see how you can use what's > there to achieve your legitimate purposes. Even so, you will have crossed > a line where IBM support is "informal" at best. On the other hand I am having better luck with the customer rules file. The modified template below will traverse only the /wosgpfs/backmeup, as intended, and only backup files modified under that path. I guess I have a working solution that I will try at scale now. [root at wos-gateway02 bin]# cat dsm.opt SERVERNAME TAPENODE3 ARCHSYMLINKASFILE NO DOMAIN "/wosgpfs/backmeup" __________________________________________________________ /* Auto-generated GPFS policy rules file * Generated on Wed May 24 12:12:51 2017 */ /* Server rules for backup server 1 *** TAPENODE3 *** */ RULE EXTERNAL LIST 'mmbackup.1.TAPENODE3' EXEC '/wosgpfs/.mmbackupCfg/BAexecScript.wosgpfs' OPTS '"/wosgpfs/.mmbackupShadow.1.TAPENODE3.filesys.update" "-servername=TAPENODE3" "-auditlogname=/wosgpfs/mmbackup.audit.wosgpfs.TAPENODE3" "NONE"' RULE 'BackupRule' LIST 'mmbackup.1.TAPENODE3' DIRECTORIES_PLUS SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' ELSE 'resdnt' END )) WHERE ( NOT ( (PATH_NAME LIKE '/%/.mmbackup%') OR (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.mmLockDir/%') OR (PATH_NAME LIKE '/%/.g2w/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/ignore/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/junk/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/project/%') OR /* DO NOT TRAVERSE OR BACKUP */ (MODE LIKE 's%') ) ) AND (PATH_NAME LIKE '/%/backmeup/%') /* TRAVERSE AND BACKUP */ AND (MISC_ATTRIBUTES LIKE '%u%') AND ( NOT ( (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.SpaceMan/%') ) ) AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) _________________________________________________________ [root at wos-gateway02 bin]# time ./mmbackup-wos.sh -------------------------------------------------------- mmbackup: Backup of /wosgpfs begins at Mon May 29 15:54:47 EDT 2017. -------------------------------------------------------- Mon May 29 15:54:49 2017 mmbackup:using user supplied policy rules: /sysadmin/BA/ba-wos/bin/mmbackupRules.wosgpfs Mon May 29 15:54:49 2017 mmbackup:Scanning file system wosgpfs Mon May 29 15:54:52 2017 mmbackup:Determining file system changes for wosgpfs [TAPENODE3]. Mon May 29 15:54:52 2017 mmbackup:changed=3, expired=0, unsupported=0 for server [TAPENODE3] Mon May 29 15:54:52 2017 mmbackup:Sending files to the TSM server [3 changed, 0 expired]. mmbackup: TSM Summary Information: Total number of objects inspected: 3 Total number of objects backed up: 3 Total number of objects updated: 0 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 0 Total number of objects failed: 0 Total number of objects encrypted: 0 Total number of bytes inspected: 4096 Total number of bytes transferred: 512 ---------------------------------------------------------- mmbackup: Backup of /wosgpfs completed successfully at Mon May 29 15:54:56 EDT 2017. ---------------------------------------------------------- real 0m9.276s user 0m2.906s sys 0m3.212s _________________________________________________________ Thanks for all the help Jaime > > > > > From: Jez Tucker > To: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 03:33 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > When mmbackup has passed the preflight stage (pretty quickly) you'll > find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* > > Best, > > Jez > > > On 18/05/17 20:02, Jaime Pinto wrote: > Ok Mark > > I'll follow your option 2) suggestion, and capture what mmbackup is using > as a rule first, then modify it. > > I imagine by 'capture' you are referring to the -L n level I use? > > -L n > Controls the level of information displayed by the > mmbackup command. Larger values indicate the > display of more detailed information. n should be one of > the following values: > > 3 > Displays the same information as 2, plus each > candidate file and the applicable rule. > > 4 > Displays the same information as 3, plus each > explicitly EXCLUDEed or LISTed > file, and the applicable rule. > > 5 > Displays the same information as 4, plus the > attributes of candidate and EXCLUDEed or > LISTed files. > > 6 > Displays the same information as 5, plus > non-candidate files and their attributes. > > Thanks > Jaime > > > > > Quoting "Marc A Kaplan" : > > 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup > wants to support incremental backups (using what it calls its shadow > database) and keep both your sanity and its sanity -- so mmbackup limits > you to either full filesystem or full inode-space (independent fileset.) > If you want to do something else, okay, but you have to be careful and be > sure of yourself. IBM will not be able to jump in and help you if and when > > it comes time to restore and you discover that your backup(s) were not > complete. > > 2. If you decide you're a big boy (or woman or XXX) and want to do some > hacking ... Fine... But even then, I suggest you do the smallest hack > that will mostly achieve your goal... > DO NOT think you can create a custom policy rules list for mmbackup out of > > thin air.... Capture the rules mmbackup creates and make small changes to > > that -- > And as with any disaster recovery plan..... Plan your Test and Test your > > Plan.... Then do some dry run recoveries before you really "need" to do a > > real recovery. > > I only even sugest this because Jaime says he has a huge filesystem with > several dependent filesets and he really, really wants to do a partial > backup, without first copying or re-organizing the filesets. > > HMMM.... otoh... if you have one or more dependent filesets that are > smallish, and/or you don't need the backups -- create independent > filesets, copy/move/delete the data, rename, voila. > > > > From: "Jaime Pinto" > To: "Marc A Kaplan" > Cc: "gpfsug main discussion list" > Date: 05/18/2017 12:36 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Marc > > The -P option may be a very good workaround, but I still have to test it. > > I'm currently trying to craft the mm rule, as minimalist as possible, > however I'm not sure about what attributes mmbackup expects to see. > > Below is my first attempt. It would be nice to get comments from > somebody familiar with the inner works of mmbackup. > > Thanks > Jaime > > > /* A macro to abbreviate VARCHAR */ > define([vc],[VARCHAR($1)]) > > /* Define three external lists */ > RULE EXTERNAL LIST 'allfiles' EXEC > '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' > > /* Generate a list of all files, directories, plus all other file > system objects, > like symlinks, named pipes, etc. Include the owner's id with each > object and > sort them by the owner's id */ > > RULE 'r1' LIST 'allfiles' > DIRECTORIES_PLUS > SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || > vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) > FROM POOL 'system' > FOR FILESET('sysadmin3') > > /* Files in special filesets, such as those excluded, are never traversed > */ > RULE 'ExcSpecialFile' EXCLUDE > FOR FILESET('scratch3','project3') > > > > > > Quoting "Marc A Kaplan" : > > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that > the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup > for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset .... > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support > team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused > and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such > thing. > Filesets are kinda like little filesystems within filesystems. Moving > a > file from one fileset to another requires a copy operation. There is > no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject: Re: [gpfsug-discuss] What is an independent fileset? > was: > mmbackup with fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always > think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that > are > in a separable range of inode numbers - this allows GPFS to efficiently > do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be > represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset.... If you didn't say otherwise, inodes come from the > default "root" fileset.... > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned.... > > BTW - if you look at the bits of the inode numbers carefully --- you > may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope > errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots > of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the > only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor > ESS, > so anyone in this list feel free to give feedback on that page people > with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a > new > independent fileset (multiple ways to do that) or use the TSM client > config. > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Jaime Pinto" > Cc: > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Date: Thu, May 18, 2017 4:43 AM > > There is hope. See reference link below: > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > > > > > > The issue has to do with dependent vs. independent filesets, something > I didn't even realize existed until now. Our filesets are dependent > (for no particular reason), so I have to find a way to turn them into > independent. > > The proper option syntax is "--scope inodespace", and the error > message actually flagged that out, however I didn't know how to > interpret what I saw: > > > # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > -------------------------------------------------------- > mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 > 21:27:43 EDT 2017. > -------------------------------------------------------- > Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* > fileset sysadmin3 is not supported > Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > -------------------------------------------------------- > > Will post the outcome. > Jaime > > > > Quoting "Jaime Pinto" : > > Quoting "Luis Bolinches" : > > Hi > > have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as > inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, > and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > > > Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is > linked > on /IBM/GPFS/FSET1 > > dsm.sys > ... > > DOMAIN /IBM/GPFS > EXCLUDE.DIR /IBM/GPFS/FSET1 > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > > Date: 17-05-17 23:44 > Subject: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: > * project3 > * scratch3 > * sysadmin3 > > I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we > have no need or space to include *scratch3* on TSM. > > Question: how to craft the mmbackup command to backup > /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? > > Below are 3 types of errors: > > 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > > ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up > dependent fileset sysadmin3 is not supported > Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > > 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope filesystem --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > These examples don't really cover my case: > > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > > > > > > > Thanks > Jaime > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Jez Tucker > Head of Research and Development, Pixit Media > 07764193820 | jtucker at pixitmedia.com > www.pixitmedia.com | Tw:@pixitmedia.com > > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email._______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. From Tomasz.Wolski at ts.fujitsu.com Mon May 29 21:23:12 2017 From: Tomasz.Wolski at ts.fujitsu.com (Tomasz.Wolski at ts.fujitsu.com) Date: Mon, 29 May 2017 20:23:12 +0000 Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 Message-ID: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> Hello, We are planning to integrate new IBM Spectrum Scale version 4.2.3 into our software, but in our current software release we have version 4.1.1 integrated. We are worried how would node-at-a-time updates look like when our customer wanted to update his cluster from 4.1.1 to 4.2.3 version. According to "Concepts, Planning and Installation Guide" (for 4.2.3), there's a limited compatibility between two GPFS versions and if they're not adjacent, then following update path is advised: "If you want to migrate to an IBM Spectrum Scale version that is not an adjacent release of your current version (for example, version 4.1.1.x to version 4.2.3), you can follow this migration path depending on your current version: V 3.5 > V 4.1.0.x > V 4.1.1.x > V 4.2.0.x > V 4.2.1.x > V 4.2.2.x > V 4.2.3.x" My question is: is the above statement true even though on nodes where new GPFS 4.2.3 is installed these nodes will not be migrated to latest release with "mmchconfig release=LATEST" until all nodes in the cluster will have been updated to version 4.2.3? In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? Best regards, Tomasz Wolski With best regards / Mit freundlichen Gr??en / Pozdrawiam Tomasz Wolski Development Engineer NDC Eternus CS HE (ET2) [cid:image002.gif at 01CE62B9.8ACFA960] FUJITSU Fujitsu Technology Solutions Sp. z o.o. Textorial Park Bldg C, ul. Fabryczna 17 90-344 Lodz, Poland -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2774 bytes Desc: image001.gif URL: From knop at us.ibm.com Tue May 30 03:54:04 2017 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 29 May 2017 22:54:04 -0400 Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 In-Reply-To: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> References: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> Message-ID: Tomasz, The statement below from "Concepts, Planning and Installation Guide" was found to be incorrect and is being withdrawn from the publications. The team is currently working on improvements to the guidance being provided for migration. For a cluster which is not running protocols like NFS/SMB/Object, migration of nodes one-at-a-time from 4.1.1 to 4.2.3 should work. Once all nodes are migrated to 4.2.3, command mmchconfig release=LATEST can be issued to move the cluster to the 4.2.3 level. Note that the command above will not change the file system level. The file system can be moved to the latest level with command mmchfs file-system-name -V full In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? That is expected to work. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 05/29/2017 04:24 PM Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, We are planning to integrate new IBM Spectrum Scale version 4.2.3 into our software, but in our current software release we have version 4.1.1 integrated. We are worried how would node-at-a-time updates look like when our customer wanted to update his cluster from 4.1.1 to 4.2.3 version. According to ?Concepts, Planning and Installation Guide? (for 4.2.3), there?s a limited compatibility between two GPFS versions and if they?re not adjacent, then following update path is advised: ?If you want to migrate to an IBM Spectrum Scale version that is not an adjacent release of your current version (for example, version 4.1.1.x to version 4.2.3), you can follow this migration path depending on your current version: V 3.5 > V 4.1.0.x > V 4.1.1.x > V 4.2.0.x > V 4.2.1.x > V 4.2.2.x > V 4.2.3.x? My question is: is the above statement true even though on nodes where new GPFS 4.2.3 is installed these nodes will not be migrated to latest release with ?mmchconfig release=LATEST? until all nodes in the cluster will have been updated to version 4.2.3? In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? Best regards, Tomasz Wolski With best regards / Mit freundlichen Gr??en / Pozdrawiam Tomasz Wolski Development Engineer NDC Eternus CS HE (ET2) FUJITSU Fujitsu Technology Solutions Sp. z o.o. Textorial Park Bldg C, ul. Fabryczna 17 90-344 Lodz, Poland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2774 bytes Desc: not available URL: From Achim.Rehor at de.ibm.com Tue May 30 08:42:23 2017 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Tue, 30 May 2017 09:42:23 +0200 Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 In-Reply-To: References: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 2774 bytes Desc: not available URL: From andreas.petzold at kit.edu Tue May 30 13:16:40 2017 From: andreas.petzold at kit.edu (Andreas Petzold (SCC)) Date: Tue, 30 May 2017 14:16:40 +0200 Subject: [gpfsug-discuss] Associating I/O operations with files/processes Message-ID: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> Dear group, first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 Since a few days we see this on the problematic node: 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? Thanks, Andreas -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Andreas Petzold Hermann-von-Helmholtz-Platz 1, Building 449, Room 202 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 24916 Fax: +49 721 608 24972 Email: petzold at kit.edu www.scc.kit.edu KIT ? The Research University in the Helmholtz Association Since 2010, KIT has been certified as a family-friendly university. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5323 bytes Desc: S/MIME Cryptographic Signature URL: From john.hearns at asml.com Tue May 30 13:28:17 2017 From: john.hearns at asml.com (John Hearns) Date: Tue, 30 May 2017 12:28:17 +0000 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> Message-ID: Andreas, This is a stupid reply, but please bear with me. Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS filesystem) setup. We also had a new application which did post-processing One of the users reported that a post-processing job would take about 30 minutes. However when two or more of the same application were running the job would take several hours. We finally found that this slowdown was due to the IO size, the application was using the default size. We only found this by stracing the application and spending hours staring at the trace... I am sure there are better tools for this, and I do hope you don?t have to strace every application.... really. A good tool to get a general feel for IO pattersn is 'iotop'. It might help? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold (SCC) Sent: Tuesday, May 30, 2017 2:17 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Associating I/O operations with files/processes Dear group, first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 Since a few days we see this on the problematic node: 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? Thanks, Andreas -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Andreas Petzold Hermann-von-Helmholtz-Platz 1, Building 449, Room 202 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 24916 Fax: +49 721 608 24972 Email: petzold at kit.edu https://emea01.safelinks.protection.outlook.com/?url=www.scc.kit.edu&data=01%7C01%7Cjohn.hearns%40asml.com%7Cd3f8f819bf21408c419e08d4a755bde9%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=IwCAFwU6OI38yZK9cnmAcWpWD%2BlujeYDpgXuvvAdvVg%3D&reserved=0 KIT ? The Research University in the Helmholtz Association Since 2010, KIT has been certified as a family-friendly university. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From andreas.petzold at kit.edu Tue May 30 14:12:52 2017 From: andreas.petzold at kit.edu (Andreas Petzold (SCC)) Date: Tue, 30 May 2017 15:12:52 +0200 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> Message-ID: <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> Hi John, iotop wasn't helpful. It seems to be overwhelmed by what is going on on the machine. Cheers, Andreas On 05/30/2017 02:28 PM, John Hearns wrote: > Andreas, > This is a stupid reply, but please bear with me. > Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS filesystem) setup. > We also had a new application which did post-processing One of the users reported that a post-processing job would take about 30 minutes. > However when two or more of the same application were running the job would take several hours. > > We finally found that this slowdown was due to the IO size, the application was using the default size. > We only found this by stracing the application and spending hours staring at the trace... > > I am sure there are better tools for this, and I do hope you don?t have to strace every application.... really. > A good tool to get a general feel for IO pattersn is 'iotop'. It might help? > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold (SCC) > Sent: Tuesday, May 30, 2017 2:17 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] Associating I/O operations with files/processes > > Dear group, > > first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. > > Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): > > 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 > 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 > 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 > 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 > 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 > > Since a few days we see this on the problematic node: > > 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 > 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 > 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 > 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 > > The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. > > We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? > > Thanks, > > Andreas > > -- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5323 bytes Desc: S/MIME Cryptographic Signature URL: From aaron.s.knister at nasa.gov Tue May 30 14:47:52 2017 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Tue, 30 May 2017 13:47:52 +0000 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> , <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> Message-ID: <89666459-A01A-4B1D-BEDF-F742E8E888A9@nasa.gov> Hi Andreas, I often start with an lsof to see who has files open on the troubled filesystem and then start stracing the various processes to see which is responsible. It ought to be a process blocked in uninterruptible sleep and ideally would be obvious but on a shared machine it might not be. Something else you could do is a reverse lookup of the disk addresseses in iohist using mmfileid. This won't help if these are transient files but it could point you in the right direction. Careful though it'll give your metadata disks a tickle :) the syntax would be "mmfileid $FsName -d :$DiskAddrrss" where $DiskAddress is the 4th field from the iohist". It's not a quick command to return-- it could easily take up to a half hour but it should tell you which file path contains that disk address. Sometimes this is all too tedious and in that case grabbing some trace data can help. When you're experiencing I/O trouble you can run "mmtrace trace=def start" on the node, wait about a minute or so and then run "mmtrace stop". The resulting trcrpt file is bit of a monster to go through but I do believe you can see which PIDs are responsible for the I/O given some sleuthing. If it comes to that let me know and I'll see if I can point you at some phrases to grep for. It's been a while since I've done it. -Aaron On May 30, 2017 at 09:13:09 EDT, Andreas Petzold (SCC) wrote: Hi John, iotop wasn't helpful. It seems to be overwhelmed by what is going on on the machine. Cheers, Andreas On 05/30/2017 02:28 PM, John Hearns wrote: > Andreas, > This is a stupid reply, but please bear with me. > Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS filesystem) setup. > We also had a new application which did post-processing One of the users reported that a post-processing job would take about 30 minutes. > However when two or more of the same application were running the job would take several hours. > > We finally found that this slowdown was due to the IO size, the application was using the default size. > We only found this by stracing the application and spending hours staring at the trace... > > I am sure there are better tools for this, and I do hope you don?t have to strace every application.... really. > A good tool to get a general feel for IO pattersn is 'iotop'. It might help? > > > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold (SCC) > Sent: Tuesday, May 30, 2017 2:17 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] Associating I/O operations with files/processes > > Dear group, > > first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. > > Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): > > 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 > 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 > 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 > 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 > 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 > > Since a few days we see this on the problematic node: > > 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 > 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 > 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 > 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 > 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 > > The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. > > We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? > > Thanks, > > Andreas > > -- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Tue May 30 14:55:30 2017 From: oehmes at gmail.com (Sven Oehme) Date: Tue, 30 May 2017 13:55:30 +0000 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> Message-ID: Hi, the very first thing to do would be to do a mmfsadm dump iohist instead of mmdiag --iohist one time (we actually add this info in next release to mmdiag --iohist) to see if the thread type will reveal something : 07:25:53.578522 W data 1:20260249600 8192 35.930 488076 181 C0A70D0A:59076980 cli 192.167.20.129 Prefetch WritebehindWorkerThread 07:25:53.632722 W data 1:20260257792 8192 45.179 627136 173 C0A70D0A:59076980 cli 192.167.20.129 Cleaner CleanBufferThread 07:25:53.662067 W data 2:20259815424 8192 45.612 992975086 40 C0A70D0A:59076985 cli 192.167.20.130 Prefetch WritebehindWorkerThread 07:25:53.734274 W data 1:19601858560 8 0.624 50237 0 C0A70D0A:59076980 cli 192.167.20.129 MBHandler *DioHandlerThread* if you see DioHandlerThread most likely somebody changed a openflag to use O_DIRECT . if you don't use that flag even the app does only 4k i/o which is inefficient GPFS will detect this and do prefetch writebehind in large blocks, as soon as you add O_DIRECT, we don't do this anymore to honor the hint and then every single request gets handled one by one. after that the next thing would be to run a very low level trace with just IO infos like : mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N . this will start collection on the node you execute the command if you want to run it against a different node replace the dot at the end with the hostname . wait a few seconds and run mmtracectl --off you will get a message that the trace gets formated and a eventually a trace file . now grep for FIO and you get lines like this : 7.293293470 127182 TRACE_IO: FIO: write data tag 1670183 1 ioVecSize 64 1st buf 0x5C024940000 nsdId C0A71482:5872D94A da 2:51070828544 nSectors 32768 err 0 if you further reduce it to nSectors 8 you would focus only on your 4k writes you mentioned above. the key item in the line above you care about is tag 16... this is the inode number of your file. if you now do : cd /usr/lpp/mmfs/samples/util ; make then run (replace -i and filesystem path obviously) [root at fire01 util]# ./tsfindinode -i 1670183 /ibm/fs2-16m-09/ and you get a hit like this : 1670183 0 /ibm/fs2-16m-09//shared/test-newbuf you now know the file that is being accessed in the I/O example above is /ibm/fs2-16m-09//shared/test-newbuf hope that helps. sven On Tue, May 30, 2017 at 6:12 AM Andreas Petzold (SCC) < andreas.petzold at kit.edu> wrote: > Hi John, > > iotop wasn't helpful. It seems to be overwhelmed by what is going on on > the machine. > > Cheers, > > Andreas > > On 05/30/2017 02:28 PM, John Hearns wrote: > > Andreas, > > This is a stupid reply, but please bear with me. > > Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS > filesystem) setup. > > We also had a new application which did post-processing One of the users > reported that a post-processing job would take about 30 minutes. > > However when two or more of the same application were running the job > would take several hours. > > > > We finally found that this slowdown was due to the IO size, the > application was using the default size. > > We only found this by stracing the application and spending hours > staring at the trace... > > > > I am sure there are better tools for this, and I do hope you don?t have > to strace every application.... really. > > A good tool to get a general feel for IO pattersn is 'iotop'. It might > help? > > > > > > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andreas Petzold > (SCC) > > Sent: Tuesday, May 30, 2017 2:17 PM > > To: gpfsug-discuss at spectrumscale.org > > Subject: [gpfsug-discuss] Associating I/O operations with files/processes > > > > Dear group, > > > > first a quick introduction: at KIT we are running a 20+PB storage system > with several large (1-9PB) file systems. We have a 14 node NSD server > cluster and 5 small (~10 nodes) protocol node clusters which each mount one > of the file systems. The protocol nodes run server software (dCache, > xrootd) specific to our users which primarily are the LHC experiments at > CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, > while the protocol nodes communicate via Ethernet to their clients. > > > > Now let me describe the problem we are facing. Since a few days, one of > the protocol nodes shows a very strange and as of yet unexplained I/O > behaviour. Before we were usually seeing reads like this (iohist example > from a well behaved node): > > > > 14:03:37.637526 R data 32:138835918848 8192 46.626 cli > 0A417D79:58E3B179 172.18.224.19 > > 14:03:37.660177 R data 18:12590325760 8192 25.498 cli > 0A4179AD:58E3AE66 172.18.224.14 > > 14:03:37.640660 R data 15:106365067264 8192 45.682 cli > 0A4179AD:58E3ADD7 172.18.224.14 > > 14:03:37.657006 R data 35:130482421760 8192 30.872 cli > 0A417DAD:58E3B266 172.18.224.21 > > 14:03:37.643908 R data 33:107847139328 8192 45.571 cli > 0A417DAD:58E3B206 172.18.224.21 > > > > Since a few days we see this on the problematic node: > > > > 14:06:27.253537 R data 46:126258287872 8 15.474 cli > 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.268626 R data 40:137280768624 8 0.395 cli > 0A4179AD:58E3ADE3 172.18.224.14 > > 14:06:27.269056 R data 46:56452781528 8 0.427 cli > 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.269417 R data 47:97273159640 8 0.293 cli > 0A4179AD:58E3AE5A 172.18.224.14 > > 14:06:27.269293 R data 49:59102786168 8 0.425 cli > 0A4179AD:58E3AE72 172.18.224.14 > > 14:06:27.269531 R data 46:142387326944 8 0.340 cli > 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.269377 R data 28:102988517096 8 0.554 cli > 0A417879:58E3AD08 172.18.224.10 > > > > The number of read ops has gone up by O(1000) which is what one would > expect when going from 8192 sector reads to 8 sector reads. > > > > We have already excluded problems of node itself so we are focusing on > the applications running on the node. What we'd like to to is to associate > the I/O requests either with files or specific processes running on the > machine in order to be able to blame the correct application. Can somebody > tell us, if this is possible and if now, if there are other ways to > understand what application is causing this? > > > > Thanks, > > > > Andreas > > > > -- > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.petzold at kit.edu Tue May 30 15:00:27 2017 From: andreas.petzold at kit.edu (Andreas Petzold (SCC)) Date: Tue, 30 May 2017 16:00:27 +0200 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> <66dace57-ac3c-4db8-de2a-fd6e2d18d9f7@kit.edu> Message-ID: <45aa5c60-4a79-015a-7236-556b7834714f@kit.edu> Hi Sven, we are seeing FileBlockRandomReadFetchHandlerThread. I'll let you know once I have more results Thanks, Andreas On 05/30/2017 03:55 PM, Sven Oehme wrote: > Hi, > > the very first thing to do would be to do a mmfsadm dump iohist instead > of mmdiag --iohist one time (we actually add this info in next release > to mmdiag --iohist) to see if the thread type will reveal something : > > 07:25:53.578522 W data 1:20260249600 8192 35.930 > 488076 181 C0A70D0A:59076980 cli 192.167.20.129 Prefetch > WritebehindWorkerThread > 07:25:53.632722 W data 1:20260257792 8192 45.179 > 627136 173 C0A70D0A:59076980 cli 192.167.20.129 Cleaner > CleanBufferThread > 07:25:53.662067 W data 2:20259815424 8192 45.612 > 992975086 40 C0A70D0A:59076985 cli 192.167.20.130 Prefetch > WritebehindWorkerThread > 07:25:53.734274 W data 1:19601858560 8 0.624 > 50237 0 C0A70D0A:59076980 cli 192.167.20.129 MBHandler > *_DioHandlerThread_* > > if you see DioHandlerThread most likely somebody changed a openflag to > use O_DIRECT . if you don't use that flag even the app does only 4k i/o > which is inefficient GPFS will detect this and do prefetch writebehind > in large blocks, as soon as you add O_DIRECT, we don't do this anymore > to honor the hint and then every single request gets handled one by one. > > after that the next thing would be to run a very low level trace with > just IO infos like : > > mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N . > > this will start collection on the node you execute the command if you > want to run it against a different node replace the dot at the end with > the hostname . > wait a few seconds and run > > mmtracectl --off > > you will get a message that the trace gets formated and a eventually a > trace file . > now grep for FIO and you get lines like this : > > > 7.293293470 127182 TRACE_IO: FIO: write data tag 1670183 1 ioVecSize > 64 1st buf 0x5C024940000 nsdId C0A71482:5872D94A da 2:51070828544 > nSectors 32768 err 0 > > if you further reduce it to nSectors 8 you would focus only on your 4k > writes you mentioned above. > > the key item in the line above you care about is tag 16... this is the > inode number of your file. > if you now do : > > cd /usr/lpp/mmfs/samples/util ; make > then run (replace -i and filesystem path obviously) > > [root at fire01 util]# ./tsfindinode -i 1670183 /ibm/fs2-16m-09/ > > and you get a hit like this : > > 1670183 0 /ibm/fs2-16m-09//shared/test-newbuf > > you now know the file that is being accessed in the I/O example above is > /ibm/fs2-16m-09//shared/test-newbuf > > hope that helps. > > sven > > > > > On Tue, May 30, 2017 at 6:12 AM Andreas Petzold (SCC) > > wrote: > > Hi John, > > iotop wasn't helpful. It seems to be overwhelmed by what is going on on > the machine. > > Cheers, > > Andreas > > On 05/30/2017 02:28 PM, John Hearns wrote: > > Andreas, > > This is a stupid reply, but please bear with me. > > Not exactly GPFS related, but I once managed an SGI CXFS > (Clustered XFS filesystem) setup. > > We also had a new application which did post-processing One of the > users reported that a post-processing job would take about 30 minutes. > > However when two or more of the same application were running the > job would take several hours. > > > > We finally found that this slowdown was due to the IO size, the > application was using the default size. > > We only found this by stracing the application and spending hours > staring at the trace... > > > > I am sure there are better tools for this, and I do hope you don?t > have to strace every application.... really. > > A good tool to get a general feel for IO pattersn is 'iotop'. It > might help? > > > > > > > > > > -----Original Message----- > > From: gpfsug-discuss-bounces at spectrumscale.org > > [mailto:gpfsug-discuss-bounces at spectrumscale.org > ] On Behalf Of > Andreas Petzold (SCC) > > Sent: Tuesday, May 30, 2017 2:17 PM > > To: gpfsug-discuss at spectrumscale.org > > > Subject: [gpfsug-discuss] Associating I/O operations with > files/processes > > > > Dear group, > > > > first a quick introduction: at KIT we are running a 20+PB storage > system with several large (1-9PB) file systems. We have a 14 node > NSD server cluster and 5 small (~10 nodes) protocol node clusters > which each mount one of the file systems. The protocol nodes run > server software (dCache, xrootd) specific to our users which > primarily are the LHC experiments at CERN. GPFS version is 4.2.2 > everywhere. All servers are connected via IB, while the protocol > nodes communicate via Ethernet to their clients. > > > > Now let me describe the problem we are facing. Since a few days, > one of the protocol nodes shows a very strange and as of yet > unexplained I/O behaviour. Before we were usually seeing reads like > this (iohist example from a well behaved node): > > > > 14:03:37.637526 R data 32:138835918848 8192 46.626 > cli 0A417D79:58E3B179 172.18.224.19 > > 14:03:37.660177 R data 18:12590325760 8192 25.498 > cli 0A4179AD:58E3AE66 172.18.224.14 > > 14:03:37.640660 R data 15:106365067264 8192 45.682 > cli 0A4179AD:58E3ADD7 172.18.224.14 > > 14:03:37.657006 R data 35:130482421760 8192 30.872 > cli 0A417DAD:58E3B266 172.18.224.21 > > 14:03:37.643908 R data 33:107847139328 8192 45.571 > cli 0A417DAD:58E3B206 172.18.224.21 > > > > Since a few days we see this on the problematic node: > > > > 14:06:27.253537 R data 46:126258287872 8 15.474 > cli 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.268626 R data 40:137280768624 8 0.395 > cli 0A4179AD:58E3ADE3 172.18.224.14 > > 14:06:27.269056 R data 46:56452781528 8 0.427 > cli 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.269417 R data 47:97273159640 8 0.293 > cli 0A4179AD:58E3AE5A 172.18.224.14 > > 14:06:27.269293 R data 49:59102786168 8 0.425 > cli 0A4179AD:58E3AE72 172.18.224.14 > > 14:06:27.269531 R data 46:142387326944 8 0.340 > cli 0A4179AB:58E3AE54 172.18.224.13 > > 14:06:27.269377 R data 28:102988517096 8 0.554 > cli 0A417879:58E3AD08 172.18.224.10 > > > > The number of read ops has gone up by O(1000) which is what one > would expect when going from 8192 sector reads to 8 sector reads. > > > > We have already excluded problems of node itself so we are > focusing on the applications running on the node. What we'd like to > to is to associate the I/O requests either with files or specific > processes running on the machine in order to be able to blame the > correct application. Can somebody tell us, if this is possible and > if now, if there are other ways to understand what application is > causing this? > > > > Thanks, > > > > Andreas > > > > -- > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Andreas Petzold Hermann-von-Helmholtz-Platz 1, Building 449, Room 202 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 24916 Fax: +49 721 608 24972 Email: petzold at kit.edu www.scc.kit.edu KIT ? The Research University in the Helmholtz Association Since 2010, KIT has been certified as a family-friendly university. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5323 bytes Desc: S/MIME Cryptographic Signature URL: From makaplan at us.ibm.com Tue May 30 15:39:50 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 30 May 2017 14:39:50 +0000 Subject: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors In-Reply-To: <20170529160138.18847jpj5x9kz8ki@support.scinet.utoronto.ca> References: <20170517214329.11704r69rjlw6a0x@support.scinet.utoronto.ca><20170517194858.67592w2m4bx0c6sq@support.scinet.utoronto.ca><20170518095851.21296pxm3i2xl2fv@support.scinet.utoronto.ca><20170518123646.70933xe3u3kyvela@support.scinet.utoronto.ca><20170518150246.890675i0fcqdnumu@support.scinet.utoronto.ca> Message-ID: Regarding mmbackup and TSM INCLUDE/EXCLUDE, I found this doc by googling... http://www-01.ibm.com/support/docview.wss?uid=swg21699569 Which says, among other things and includes many ifs,and,buts : "... include and exclude options are interpreted differently by the IBM Spectrum Scale mmbackup command and by the IBM Spectrum Protect backup-archive client..." I think mmbackup tries to handle usual, sensible, variants of the TSM directives that can be directly "translated" to more logical SQL, so you don't have to follow all the twists, but if it isn't working as you expected... RTFM... OTOH... If you are like or can work with the customize-the-policy-rules approach -- that is good too and makes possible finer grain controls. From: "Jaime Pinto" To: "gpfsug main discussion list" , "Marc A Kaplan" Date: 05/29/2017 04:01 PM Subject: Re: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors Quoting "Marc A Kaplan" : > Easier than hacking mmbackup or writing/editing policy rules, > > mmbackup interprets > your TSM INCLUDE/EXCLUDE configuration statements -- so that is a > supported and recommended way of doing business... Finally got some time to resume testing on this Here is the syntax used (In this test I want to backup /wosgpfs/backmeup only) mmbackup /wosgpfs -N wos-gateway02-ib0 -s /dev/shm --tsm-errorlog $logfile -L 4 As far as I can tell, the EXCLUDE statements in the TSM configuration (dsm.opt) are being *ignored*. I tried a couple of formats: 1) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup" INCLExcl "/sysadmin/BA/ba-wos/bin/inclexcl" 1a) cat /sysadmin/BA/ba-wos/bin/inclexcl exclude.dir /wosgpfs/ignore /wosgpfs/junk /wosgpfs/project 1b) cat /sysadmin/BA/ba-wos/bin/inclexcl exclude.dir /wosgpfs/ignore exclude.dir /wosgpfs/junk exclude.dir /wosgpfs/project 2) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup" exclude.dir /wosgpfs/ignore exclude.dir /wosgpfs/junk exclude.dir /wosgpfs/project 3) cat dsm.opt SERVERNAME TAPENODE3 DOMAIN "/wosgpfs/backmeup -/wosgpfs/ignore -/wosgpfs/junk -/wosgpfs/project" In another words, all the contents under /wosgpfs are being traversed and going to the TSM backup. Furthermore, even with "-L 4" mmbackup is not logging the list of files being sent to the TSM backup anywhere on the client side. I only get that information from the TSM server side (get filespace). I know that all contents of /wosgpfs are being traversed because I have a tail on /wosgpfs/.mmbackupShadow.1.TAPENODE3.filesys.update > > If that doesn't do it for your purposes... You're into some light > hacking... So look inside the mmbackup and tsbackup33 scripts and you'll > find some DEBUG variables that should allow for keeping work and temp > files around ... including the generated policy rules. > I'm calling this hacking "light", because I don't think you'll need to > change the scripts, but just look around and see how you can use what's > there to achieve your legitimate purposes. Even so, you will have crossed > a line where IBM support is "informal" at best. On the other hand I am having better luck with the customer rules file. The modified template below will traverse only the /wosgpfs/backmeup, as intended, and only backup files modified under that path. I guess I have a working solution that I will try at scale now. [root at wos-gateway02 bin]# cat dsm.opt SERVERNAME TAPENODE3 ARCHSYMLINKASFILE NO DOMAIN "/wosgpfs/backmeup" __________________________________________________________ /* Auto-generated GPFS policy rules file * Generated on Wed May 24 12:12:51 2017 */ /* Server rules for backup server 1 *** TAPENODE3 *** */ RULE EXTERNAL LIST 'mmbackup.1.TAPENODE3' EXEC '/wosgpfs/.mmbackupCfg/BAexecScript.wosgpfs' OPTS '"/wosgpfs/.mmbackupShadow.1.TAPENODE3.filesys.update" "-servername=TAPENODE3" "-auditlogname=/wosgpfs/mmbackup.audit.wosgpfs.TAPENODE3" "NONE"' RULE 'BackupRule' LIST 'mmbackup.1.TAPENODE3' DIRECTORIES_PLUS SHOW(VARCHAR(MODIFICATION_TIME) || ' ' || VARCHAR(CHANGE_TIME) || ' ' || VARCHAR(FILE_SIZE) || ' ' || VARCHAR(FILESET_NAME) || ' ' || (CASE WHEN XATTR('dmapi.IBMObj') IS NOT NULL THEN 'migrat' WHEN XATTR('dmapi.IBMPMig') IS NOT NULL THEN 'premig' ELSE 'resdnt' END )) WHERE ( NOT ( (PATH_NAME LIKE '/%/.mmbackup%') OR (PATH_NAME LIKE '/%/.mmLockDir' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.mmLockDir/%') OR (PATH_NAME LIKE '/%/.g2w/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/ignore/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/junk/%') OR /* DO NOT TRAVERSE OR BACKUP */ (PATH_NAME LIKE '/%/project/%') OR /* DO NOT TRAVERSE OR BACKUP */ (MODE LIKE 's%') ) ) AND (PATH_NAME LIKE '/%/backmeup/%') /* TRAVERSE AND BACKUP */ AND (MISC_ATTRIBUTES LIKE '%u%') AND ( NOT ( (PATH_NAME LIKE '/%/.SpaceMan' AND MODE LIKE 'd%') OR (PATH_NAME LIKE '/%/.SpaceMan/%') ) ) AND (NOT (PATH_NAME LIKE '/%/.TsmCacheDir' AND MODE LIKE 'd%') AND NOT (PATH_NAME LIKE '/%/.TsmCacheDir/%')) _________________________________________________________ [root at wos-gateway02 bin]# time ./mmbackup-wos.sh -------------------------------------------------------- mmbackup: Backup of /wosgpfs begins at Mon May 29 15:54:47 EDT 2017. -------------------------------------------------------- Mon May 29 15:54:49 2017 mmbackup:using user supplied policy rules: /sysadmin/BA/ba-wos/bin/mmbackupRules.wosgpfs Mon May 29 15:54:49 2017 mmbackup:Scanning file system wosgpfs Mon May 29 15:54:52 2017 mmbackup:Determining file system changes for wosgpfs [TAPENODE3]. Mon May 29 15:54:52 2017 mmbackup:changed=3, expired=0, unsupported=0 for server [TAPENODE3] Mon May 29 15:54:52 2017 mmbackup:Sending files to the TSM server [3 changed, 0 expired]. mmbackup: TSM Summary Information: Total number of objects inspected: 3 Total number of objects backed up: 3 Total number of objects updated: 0 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 0 Total number of objects failed: 0 Total number of objects encrypted: 0 Total number of bytes inspected: 4096 Total number of bytes transferred: 512 ---------------------------------------------------------- mmbackup: Backup of /wosgpfs completed successfully at Mon May 29 15:54:56 EDT 2017. ---------------------------------------------------------- real 0m9.276s user 0m2.906s sys 0m3.212s _________________________________________________________ Thanks for all the help Jaime > > > > > From: Jez Tucker > To: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 03:33 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > When mmbackup has passed the preflight stage (pretty quickly) you'll > find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* > > Best, > > Jez > > > On 18/05/17 20:02, Jaime Pinto wrote: > Ok Mark > > I'll follow your option 2) suggestion, and capture what mmbackup is using > as a rule first, then modify it. > > I imagine by 'capture' you are referring to the -L n level I use? > > -L n > Controls the level of information displayed by the > mmbackup command. Larger values indicate the > display of more detailed information. n should be one of > the following values: > > 3 > Displays the same information as 2, plus each > candidate file and the applicable rule. > > 4 > Displays the same information as 3, plus each > explicitly EXCLUDEed or LISTed > file, and the applicable rule. > > 5 > Displays the same information as 4, plus the > attributes of candidate and EXCLUDEed or > LISTed files. > > 6 > Displays the same information as 5, plus > non-candidate files and their attributes. > > Thanks > Jaime > > > > > Quoting "Marc A Kaplan" : > > 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup > wants to support incremental backups (using what it calls its shadow > database) and keep both your sanity and its sanity -- so mmbackup limits > you to either full filesystem or full inode-space (independent fileset.) > If you want to do something else, okay, but you have to be careful and be > sure of yourself. IBM will not be able to jump in and help you if and when > > it comes time to restore and you discover that your backup(s) were not > complete. > > 2. If you decide you're a big boy (or woman or XXX) and want to do some > hacking ... Fine... But even then, I suggest you do the smallest hack > that will mostly achieve your goal... > DO NOT think you can create a custom policy rules list for mmbackup out of > > thin air.... Capture the rules mmbackup creates and make small changes to > > that -- > And as with any disaster recovery plan..... Plan your Test and Test your > > Plan.... Then do some dry run recoveries before you really "need" to do a > > real recovery. > > I only even sugest this because Jaime says he has a huge filesystem with > several dependent filesets and he really, really wants to do a partial > backup, without first copying or re-organizing the filesets. > > HMMM.... otoh... if you have one or more dependent filesets that are > smallish, and/or you don't need the backups -- create independent > filesets, copy/move/delete the data, rename, voila. > > > > From: "Jaime Pinto" > To: "Marc A Kaplan" > Cc: "gpfsug main discussion list" > Date: 05/18/2017 12:36 PM > Subject: Re: [gpfsug-discuss] What is an independent fileset? was: > mmbackup with fileset : scope errors > > > > Marc > > The -P option may be a very good workaround, but I still have to test it. > > I'm currently trying to craft the mm rule, as minimalist as possible, > however I'm not sure about what attributes mmbackup expects to see. > > Below is my first attempt. It would be nice to get comments from > somebody familiar with the inner works of mmbackup. > > Thanks > Jaime > > > /* A macro to abbreviate VARCHAR */ > define([vc],[VARCHAR($1)]) > > /* Define three external lists */ > RULE EXTERNAL LIST 'allfiles' EXEC > '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' > > /* Generate a list of all files, directories, plus all other file > system objects, > like symlinks, named pipes, etc. Include the owner's id with each > object and > sort them by the owner's id */ > > RULE 'r1' LIST 'allfiles' > DIRECTORIES_PLUS > SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || > vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) > FROM POOL 'system' > FOR FILESET('sysadmin3') > > /* Files in special filesets, such as those excluded, are never traversed > */ > RULE 'ExcSpecialFile' EXCLUDE > FOR FILESET('scratch3','project3') > > > > > > Quoting "Marc A Kaplan" : > > Jaime, > > While we're waiting for the mmbackup expert to weigh in, notice that > the > mmbackup command does have a -P option that allows you to provide a > customized policy rules file. > > So... a fairly safe hack is to do a trial mmbackup run, capture the > automatically generated policy file, and then augment it with FOR > FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup > for > real with your customized policy file. > > mmbackup uses mmapplypolicy which by itself is happy to limit its > directory scan to a particular fileset by using > > mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope > fileset .... > > However, mmbackup probably has other worries and for simpliciity and > helping make sure you get complete, sensible backups, apparently has > imposed some restrictions to preserve sanity (yours and our support > team! > ;-) ) ... (For example, suppose you were doing incremental backups, > starting at different paths each time? -- happy to do so, but when > disaster strikes and you want to restore -- you'll end up confused > and/or > unhappy!) > > "converting from one fileset to another" --- sorry there is no such > thing. > Filesets are kinda like little filesystems within filesystems. Moving > a > file from one fileset to another requires a copy operation. There is > no > fast move nor hardlinking. > > --marc > > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > , > "Marc A Kaplan" > Date: 05/18/2017 09:58 AM > Subject: Re: [gpfsug-discuss] What is an independent fileset? > was: > mmbackup with fileset : scope errors > > > > Thanks for the explanation Mark and Luis, > > It begs the question: why filesets are created as dependent by > default, if the adverse repercussions can be so great afterward? Even > in my case, where I manage GPFS and TSM deployments (and I have been > around for a while), didn't realize at all that not adding and extra > option at fileset creation time would cause me huge trouble with > scaling later on as I try to use mmbackup. > > When you have different groups to manage file systems and backups that > don't read each-other's manuals ahead of time then we have a really > bad recipe. > > I'm looking forward to your explanation as to why mmbackup cares one > way or another. > > I'm also hoping for a hint as to how to configure backup exclusion > rules on the TSM side to exclude fileset traversing on the GPFS side. > Is mmbackup smart enough (actually smarter than TSM client itself) to > read the exclusion rules on the TSM configuration and apply them > before traversing? > > Thanks > Jaime > > Quoting "Marc A Kaplan" : > > When I see "independent fileset" (in Spectrum/GPFS/Scale) I always > think > and try to read that as "inode space". > > An "independent fileset" has all the attributes of an (older-fashioned) > dependent fileset PLUS all of its files are represented by inodes that > are > in a separable range of inode numbers - this allows GPFS to efficiently > do > snapshots of just that inode-space (uh... independent fileset)... > > And... of course the files of dependent filesets must also be > represented > by inodes -- those inode numbers are within the inode-space of whatever > the containing independent fileset is... as was chosen when you created > the fileset.... If you didn't say otherwise, inodes come from the > default "root" fileset.... > > Clear as your bath-water, no? > > So why does mmbackup care one way or another ??? Stay tuned.... > > BTW - if you look at the bits of the inode numbers carefully --- you > may > not immediately discern what I mean by a "separable range of inode > numbers" -- (very technical hint) you may need to permute the bit order > before you discern a simple pattern... > > > > From: "Luis Bolinches" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Date: 05/18/2017 02:10 AM > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope > errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Hi > > There is no direct way to convert the one fileset that is dependent to > independent or viceversa. > > I would suggest to take a look to chapter 5 of the 2014 redbook, lots > of > definitions about GPFS ILM including filesets > http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the > only > place that is explained but I honestly believe is a good single start > point. It also needs an update as does nto have anything on CES nor > ESS, > so anyone in this list feel free to give feedback on that page people > with > funding decisions listen there. > > So you are limited to either migrate the data from that fileset to a > new > independent fileset (multiple ways to do that) or use the TSM client > config. > > ----- Original message ----- > From: "Jaime Pinto" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: "gpfsug main discussion list" , > "Jaime Pinto" > Cc: > Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors > Date: Thu, May 18, 2017 4:43 AM > > There is hope. See reference link below: > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm > > > > > > The issue has to do with dependent vs. independent filesets, something > I didn't even realize existed until now. Our filesets are dependent > (for no particular reason), so I have to find a way to turn them into > independent. > > The proper option syntax is "--scope inodespace", and the error > message actually flagged that out, however I didn't know how to > interpret what I saw: > > > # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > -------------------------------------------------------- > mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 > 21:27:43 EDT 2017. > -------------------------------------------------------- > Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* > fileset sysadmin3 is not supported > Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > -------------------------------------------------------- > > Will post the outcome. > Jaime > > > > Quoting "Jaime Pinto" : > > Quoting "Luis Bolinches" : > > Hi > > have you tried to add exceptions on the TSM client config file? > > Hey Luis, > > That would work as well (mechanically), however it's not elegant or > efficient. When you have over 1PB and 200M files on scratch it will > take many hours and several helper nodes to traverse that fileset just > to be negated by TSM. In fact exclusion on TSM are just as > inefficient. > Considering that I want to keep project and sysadmin on different > domains then it's much worst, since we have to traverse and exclude > scratch & (project|sysadmin) twice, once to capture sysadmin and again > to capture project. > > If I have to use exclusion rules it has to rely sole on gpfs rules, > and > somehow not traverse scratch at all. > > I suspect there is a way to do this properly, however the examples on > the gpfs guide and other references are not exhaustive. They only show > a couple of trivial cases. > > However my situation is not unique. I suspect there are may facilities > having to deal with backup of HUGE filesets. > > So the search is on. > > Thanks > Jaime > > > > > > Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is > linked > on /IBM/GPFS/FSET1 > > dsm.sys > ... > > DOMAIN /IBM/GPFS > EXCLUDE.DIR /IBM/GPFS/FSET1 > > > From: "Jaime Pinto" > To: "gpfsug main discussion list" > > Date: 17-05-17 23:44 > Subject: [gpfsug-discuss] mmbackup with fileset : scope errors > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: > * project3 > * scratch3 > * sysadmin3 > > I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we > have no need or space to include *scratch3* on TSM. > > Question: how to craft the mmbackup command to backup > /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? > > Below are 3 types of errors: > > 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope inodespace --tsm-errorlog $logfile -L 2 > > ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up > dependent fileset sysadmin3 is not supported > Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for > fileset level backup. exit 1 > > 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm > --scope filesystem --tsm-errorlog $logfile -L 2 > > ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem > cannot be specified at the same time. > > These examples don't really cover my case: > > > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples > > > > > > > Thanks > Jaime > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: > Oy IBM Finland Ab > PL 265, 00101 Helsinki, Finland > Business ID, Y-tunnus: 0195876-3 > Registered in Finland > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > > > > > > > > > > > > ************************************ > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > ************************************ > --- > Jaime Pinto > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > > ---------------------------------------------------------------- > This message was sent using IMP at SciNet Consortium, University of > Toronto. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Jez Tucker > Head of Research and Development, Pixit Media > 07764193820 | jtucker at pixitmedia.com > www.pixitmedia.com | Tw:@pixitmedia.com > > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email._______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ---------------------------------------------------------------- This message was sent using IMP at SciNet Consortium, University of Toronto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue May 30 16:15:11 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 30 May 2017 11:15:11 -0400 Subject: [gpfsug-discuss] Associating I/O operations with files/processes In-Reply-To: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> References: <87954142-e86a-c365-73bb-004c00bd5814@kit.edu> Message-ID: In version 4.2.3 you can turn on QOS --fine-stats and --pid-stats and get IO operations statistics for each active process on each node. https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmchqos.htm https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmlsqos.htm The statistics allow you to distinguish single sector IOPS from partial block multisector iops from full block multisector iops. Notice that to use this feature you must enable QOS, but by default you start by running with all throttles set at "unlimited". There are some overheads, so you might want to use it only when you need to find the "bad" processes. It's a little tricky to use effectively, but we give you a sample script that shows some ways to produce, massage and filter the raw data: samples/charts/qosplotfine.pl The data is available in a CSV format, so it's easy to feed into spreadsheets or data bases and crunch... --marc of GPFS. From: "Andreas Petzold (SCC)" To: Date: 05/30/2017 08:17 AM Subject: [gpfsug-discuss] Associating I/O operations with files/processes Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear group, first a quick introduction: at KIT we are running a 20+PB storage system with several large (1-9PB) file systems. We have a 14 node NSD server cluster and 5 small (~10 nodes) protocol node clusters which each mount one of the file systems. The protocol nodes run server software (dCache, xrootd) specific to our users which primarily are the LHC experiments at CERN. GPFS version is 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes communicate via Ethernet to their clients. Now let me describe the problem we are facing. Since a few days, one of the protocol nodes shows a very strange and as of yet unexplained I/O behaviour. Before we were usually seeing reads like this (iohist example from a well behaved node): 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 172.18.224.19 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 172.18.224.14 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 172.18.224.14 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 172.18.224.21 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 172.18.224.21 Since a few days we see this on the problematic node: 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 172.18.224.14 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 172.18.224.14 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 172.18.224.14 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 172.18.224.13 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 172.18.224.10 The number of read ops has gone up by O(1000) which is what one would expect when going from 8192 sector reads to 8 sector reads. We have already excluded problems of node itself so we are focusing on the applications running on the node. What we'd like to to is to associate the I/O requests either with files or specific processes running on the machine in order to be able to blame the correct application. Can somebody tell us, if this is possible and if now, if there are other ways to understand what application is causing this? Thanks, Andreas -- Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Andreas Petzold Hermann-von-Helmholtz-Platz 1, Building 449, Room 202 D-76344 Eggenstein-Leopoldshafen Tel: +49 721 608 24916 Fax: +49 721 608 24972 Email: petzold at kit.edu www.scc.kit.edu KIT ? The Research University in the Helmholtz Association Since 2010, KIT has been certified as a family-friendly university. [attachment "smime.p7s" deleted by Marc A Kaplan/Watson/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tomasz.Wolski at ts.fujitsu.com Wed May 31 10:33:29 2017 From: Tomasz.Wolski at ts.fujitsu.com (Tomasz.Wolski at ts.fujitsu.com) Date: Wed, 31 May 2017 09:33:29 +0000 Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 In-Reply-To: References: <8b2cf027f14a4916aae59976e976de5a@R01UKEXCASM223.r01.fujitsu.local> Message-ID: <5564b22a89744e06ad7003607248f279@R01UKEXCASM223.r01.fujitsu.local> Thank you very much - that?s very helpful and will save us a lot of effort :) Best regards, Tomasz Wolski From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Achim Rehor Sent: Tuesday, May 30, 2017 9:42 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 The statement was always to be release n-1 compatible. with release being (VRMF) 4.2.3.0 so all 4.2 release levels ought to be compatible with all 4.1 levels. As Felipe pointed out below, the mmchconfig RELEASE=latest will not touch the filesystem level. And if you are running remote clusters, you need to be aware, that lifting a filesystem to the latest level (mmchfs -V full) you will loose remote clusters mount ability if they are on a lower level. in these cases use the -V compat flag (and see commands refernce for details) Mit freundlichen Gr??en / Kind regards Achim Rehor ________________________________ Software Technical Support Specialist AIX/ Emea HPC Support [cid:image001.gif at 01D2DA01.B94BC9E0] IBM Certified Advanced Technical Expert - Power Systems with AIX TSCC Software Service, Dept. 7922 Global Technology Services ________________________________ Phone: +49-7034-274-7862 IBM Deutschland E-Mail: Achim.Rehor at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Reinhard Reschke, Dieter Scholz, Gregor Pillen, Ivo Koerner, Christian Noll Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940 From: "Felipe Knop" > To: gpfsug main discussion list > Date: 05/30/2017 04:54 AM Subject: Re: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Tomasz, The statement below from "Concepts, Planning and Installation Guide" was found to be incorrect and is being withdrawn from the publications. The team is currently working on improvements to the guidance being provided for migration. For a cluster which is not running protocols like NFS/SMB/Object, migration of nodes one-at-a-time from 4.1.1 to 4.2.3 should work. Once all nodes are migrated to 4.2.3, command mmchconfig release=LATEST can be issued to move the cluster to the 4.2.3 level. Note that the command above will not change the file system level. The file system can be moved to the latest level with command mmchfs file-system-name -V full In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? That is expected to work. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Tomasz.Wolski at ts.fujitsu.com" > To: "gpfsug-discuss at spectrumscale.org" > Date: 05/29/2017 04:24 PM Subject: [gpfsug-discuss] GPFS update path 4.1.1 -> 4.2.3 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, We are planning to integrate new IBM Spectrum Scale version 4.2.3 into our software, but in our current software release we have version 4.1.1 integrated. We are worried how would node-at-a-time updates look like when our customer wanted to update his cluster from 4.1.1 to 4.2.3 version. According to ?Concepts, Planning and Installation Guide? (for 4.2.3), there?s a limited compatibility between two GPFS versions and if they?re not adjacent, then following update path is advised: ?If you want to migrate to an IBM Spectrum Scale version that is not an adjacent release of your current version (for example, version 4.1.1.x to version 4.2.3), you can follow this migration path depending on your current version: V 3.5 > V 4.1.0.x > V 4.1.1.x > V 4.2.0.x > V 4.2.1.x > V 4.2.2.x > V 4.2.3.x? My question is: is the above statement true even though on nodes where new GPFS 4.2.3 is installed these nodes will not be migrated to latest release with ?mmchconfig release=LATEST? until all nodes in the cluster will have been updated to version 4.2.3? In other words: can two nodes, one with GPFS version 4.1.1 and the other with version 4.2.3, coexist in the cluster, where nodes have not been migrated to 4.2.3 (i.e. filesystem level is still at version 4.1.1)? Best regards, Tomasz Wolski With best regards / Mit freundlichen Gr??en / Pozdrawiam Tomasz Wolski Development Engineer NDC Eternus CS HE (ET2) [cid:image002.gif at 01CE62B9.8ACFA960] FUJITSU Fujitsu Technology Solutions Sp. z o.o. Textorial Park Bldg C, ul. Fabryczna 17 90-344 Lodz, Poland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 7182 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.gif Type: image/gif Size: 2774 bytes Desc: image002.gif URL: From Tomasz.Wolski at ts.fujitsu.com Wed May 31 11:00:02 2017 From: Tomasz.Wolski at ts.fujitsu.com (Tomasz.Wolski at ts.fujitsu.com) Date: Wed, 31 May 2017 10:00:02 +0000 Subject: [gpfsug-discuss] Missing gpfs filesystem device under /dev/ Message-ID: <8b209bc526024c49a4a002608f354b3c@R01UKEXCASM223.r01.fujitsu.local> Hello All, It seems that GPFS 4.2.3 does not create block device under /dev for new filesystems anymore - is this behavior intended? In manuals, there's nothing mentioned about this change. For example, having GPFS filesystem gpfs100 with mountpoint /cache/100, /proc/mounts has following entry: gpfs100 /cache/100 gpfs rw,relatime 0 0 where in older releases it used to be /dev/gpfs100 /cache/100 gpfs rw,relatime 0 0 Is there any option (i.e. supplied for mmcrfs) to have these device in /dev/ still in version 4.2.3? With best regards / Mit freundlichen Gr??en / Pozdrawiam Tomasz Wolski Development Engineer NDC Eternus CS HE (ET2) [cid:image002.gif at 01CE62B9.8ACFA960] FUJITSU Fujitsu Technology Solutions Sp. z o.o. Textorial Park Bldg C, ul. Fabryczna 17 90-344 Lodz, Poland -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2774 bytes Desc: image001.gif URL: From Robert.Oesterlin at nuance.com Wed May 31 12:13:01 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 31 May 2017 11:13:01 +0000 Subject: [gpfsug-discuss] Missing gpfs filesystem device under /dev/ Message-ID: <48568693-5E00-4637-9AEE-BA63CC8DC47A@nuance.com> This was a documented change back in (I think) GPFS 4.2.0, but I?d have to go back over the old release notes. It can?t be changed. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Tomasz.Wolski at ts.fujitsu.com" Reply-To: gpfsug main discussion list Date: Wednesday, May 31, 2017 at 5:00 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Missing gpfs filesystem device under /dev/ It seems that GPFS 4.2.3 does not create block device under /dev for new filesystems anymore - is this behavior intended? In manuals, there?s nothing mentioned about this change. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed May 31 12:25:13 2017 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 31 May 2017 07:25:13 -0400 Subject: [gpfsug-discuss] Missing gpfs filesystem device under /dev/ In-Reply-To: <48568693-5E00-4637-9AEE-BA63CC8DC47A@nuance.com> References: <48568693-5E00-4637-9AEE-BA63CC8DC47A@nuance.com> Message-ID: The change actually occurred in 4.2.1 to better integrate GPFS with systemd on RHEL 7.x. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 05/31/2017 07:13 AM Subject: Re: [gpfsug-discuss] Missing gpfs filesystem device under /dev/ Sent by: gpfsug-discuss-bounces at spectrumscale.org This was a documented change back in (I think) GPFS 4.2.0, but I?d have to go back over the old release notes. It can?t be changed. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Tomasz.Wolski at ts.fujitsu.com" Reply-To: gpfsug main discussion list Date: Wednesday, May 31, 2017 at 5:00 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Missing gpfs filesystem device under /dev/ It seems that GPFS 4.2.3 does not create block device under /dev for new filesystems anymore - is this behavior intended? In manuals, there?s nothing mentioned about this change. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: