<div dir="auto">Hi Keith,<div dir="auto"><br></div><div dir="auto">  We have barely begun with Zimon and have not (knock, knock) run up against any loss or corruption issues with Zimon. </div><div dir="auto"><br></div><div dir="auto">  However, getting data out of Zimon for various reasons is something I have been thinking about. I'm interested partly because of the granularity that is lost over time like with any round robin style data collection scheme. </div><div dir="auto"><br></div><div dir="auto">So I guess one question is whether you have considered pulling the data out to another database, looked at the SS GUI which uses a postgres db (iirc, about to take off on a flight and can't check), or looked at the Grafana bridge which would get data into OpenTsdb format, again iirc. Anyway, just some things for consideration and a request to share back whatever you find out if it's off list.</div><div dir="auto"><br></div><div dir="auto">Thanks, getting stink eye to go to airplane mode.</div><div dir="auto"><br></div><div dir="auto">More later.</div><div dir="auto"><br></div><div dir="auto">Cheers</div><div dir="auto">Kristy</div><div dir="auto"><br></div><div dir="auto">  </div><br><div class="gmail_extra" dir="auto"><br><div class="gmail_quote">On Sep 24, 2017 11:05 AM, "Keith Ball" <<a href="mailto:bipcuds@gmail.com">bipcuds@gmail.com</a>> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div>Hello All,<br><br></div>In a recent 

Spectrum Scale performance study, we used zimon/mmperfmon to gather 

metrics. During a period of 2 months, we ended up losing data twice from

 the zimon database; once after the virtual disk serving both the OS 

files and zimon collector and DB storage was resized, and a second time 

after an unknown event (the loss was discovered when plotting in Grafana

 only went back to a certain data and time; likewise, mmperfmon query 

output only went back to the same time).<br><br></div>Details:<br></div>- Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node and other clients<br></div><div>-

 Data retention in the "raw" stratum was set to 2 months; the "domains" 

settings were as follows (note that we did not hit the ceiling of 60GB 

(1GB/file * 60 files):</div><div><br></div><div>domains = {<br>        # this is the raw domain<br>        aggregation = 0         # aggregation factor for the raw domain is always 0.<br>        ram = "12g"             # amount of RAM to be used<br>        duration = "2m"         # amount of time that data with the highest precision is kept.<br>        filesize = "1g"         # maximum file size<br>        files = 60              # number of files.<br>},<br>{<br>        # this is the first aggregation domain that aggregates to 10 seconds<br>        aggregation = 10<br>        ram = "800m"            # amount of RAM to be used<br>        duration = "6m"         # keep aggregates for 1 week.<br>        filesize = "1g"         # maximum file size<br>        files = 10              # number of files.<br>},<br>{<br>        # this is the second aggregation domain that aggregates to 30*10 seconds == 5 minutes<br>        aggregation = 30<br>        ram = "800m"            # amount of RAM to be used<br>        duration = "1y"         # keep averages for 2 months.<br>        filesize = "1g"         # maximum file size<br>        files = 5               # number of files.<br>},<br>{<br>        # this is the third aggregation domain that aggregates to 24*30*10 seconds == 2 hours<br>        aggregation = 24<br>        ram = "800m"            # amount of RAM to be used<br>        duration = "2y"         #<br>        filesize = "1g"         # maximum file size<br>        files = 5               # number of files.<br>}<br><br></div><div><br></div>Questions:<br><br></div>1.) Has anyone had similar issues with losing data from zimon?</div><div><br></div><div>2.)

 Are there known circumstances where data could be lost, e.g. changing 

the aggregation domain definitions, or even simply restarting the zimon 

collector?</div><div><br></div><div>3.) Does anyone have any "best 

practices" for backing up the zimon database? We were taking weekly 

"snapshots" by shutting down the collector, and making a tarball copy of

 the /opt/ibm/zimon directory (but the database corruption/data loss 

still crept through for various reasons).</div><div><br></div><div><br></div>In

 terms of debugging, we do not have Scale or zimon logs going back to 

the suspected dates of data loss; we do have a gpfs.snap from about a 

month after the last data loss - would it have any useful clues? Opening

 a PMR could be tricky, as it was the customer who has the support 

entitlement, and the environment (specifically the old cluster 

definitino and the zimon collector VM) was torn down.<br clear="all"><div><br></div><div><br></div><div>Many Thanks,</div><div>  Keith</div><font color="#888888"><div><br></div>-- <br>Keith D. Ball, PhD<br><div><div>RedLine Performance Solutions, LLC</div><div>web:  <a href="http://www.redlineperf.com/" target="_blank">http://www.redlineperf.com/</a><br><div>email: <a href="mailto:aqualkenbush@redlineperf.com" target="_blank">kball@redlineperf.com</a></div></div></div>cell: <a href="tel:%28540%29%20557-7851" value="+15405577851" target="_blank">540-557-7851</a></font></div>

<br>______________________________<wbr>_________________<br>

gpfsug-discuss mailing list<br>

gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>

<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/<wbr>listinfo/gpfsug-discuss</a><br>

<br></blockquote></div><br></div></div>