<div dir="auto">Hi Keith,<div dir="auto"><br></div><div dir="auto"> We have barely begun with Zimon and have not (knock, knock) run up against any loss or corruption issues with Zimon. </div><div dir="auto"><br></div><div dir="auto"> However, getting data out of Zimon for various reasons is something I have been thinking about. I'm interested partly because of the granularity that is lost over time like with any round robin style data collection scheme. </div><div dir="auto"><br></div><div dir="auto">So I guess one question is whether you have considered pulling the data out to another database, looked at the SS GUI which uses a postgres db (iirc, about to take off on a flight and can't check), or looked at the Grafana bridge which would get data into OpenTsdb format, again iirc. Anyway, just some things for consideration and a request to share back whatever you find out if it's off list.</div><div dir="auto"><br></div><div dir="auto">Thanks, getting stink eye to go to airplane mode.</div><div dir="auto"><br></div><div dir="auto">More later.</div><div dir="auto"><br></div><div dir="auto">Cheers</div><div dir="auto">Kristy</div><div dir="auto"><br></div><div dir="auto"> </div><br><div class="gmail_extra" dir="auto"><br><div class="gmail_quote">On Sep 24, 2017 11:05 AM, "Keith Ball" <<a href="mailto:bipcuds@gmail.com">bipcuds@gmail.com</a>> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div>Hello All,<br><br></div>In a recent
Spectrum Scale performance study, we used zimon/mmperfmon to gather
metrics. During a period of 2 months, we ended up losing data twice from
the zimon database; once after the virtual disk serving both the OS
files and zimon collector and DB storage was resized, and a second time
after an unknown event (the loss was discovered when plotting in Grafana
only went back to a certain data and time; likewise, mmperfmon query
output only went back to the same time).<br><br></div>Details:<br></div>- Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node and other clients<br></div><div>-
Data retention in the "raw" stratum was set to 2 months; the "domains"
settings were as follows (note that we did not hit the ceiling of 60GB
(1GB/file * 60 files):</div><div><br></div><div>domains = {<br> # this is the raw domain<br> aggregation = 0 # aggregation factor for the raw domain is always 0.<br> ram = "12g" # amount of RAM to be used<br> duration = "2m" # amount of time that data with the highest precision is kept.<br> filesize = "1g" # maximum file size<br> files = 60 # number of files.<br>},<br>{<br> # this is the first aggregation domain that aggregates to 10 seconds<br> aggregation = 10<br> ram = "800m" # amount of RAM to be used<br> duration = "6m" # keep aggregates for 1 week.<br> filesize = "1g" # maximum file size<br> files = 10 # number of files.<br>},<br>{<br> # this is the second aggregation domain that aggregates to 30*10 seconds == 5 minutes<br> aggregation = 30<br> ram = "800m" # amount of RAM to be used<br> duration = "1y" # keep averages for 2 months.<br> filesize = "1g" # maximum file size<br> files = 5 # number of files.<br>},<br>{<br> # this is the third aggregation domain that aggregates to 24*30*10 seconds == 2 hours<br> aggregation = 24<br> ram = "800m" # amount of RAM to be used<br> duration = "2y" #<br> filesize = "1g" # maximum file size<br> files = 5 # number of files.<br>}<br><br></div><div><br></div>Questions:<br><br></div>1.) Has anyone had similar issues with losing data from zimon?</div><div><br></div><div>2.)
Are there known circumstances where data could be lost, e.g. changing
the aggregation domain definitions, or even simply restarting the zimon
collector?</div><div><br></div><div>3.) Does anyone have any "best
practices" for backing up the zimon database? We were taking weekly
"snapshots" by shutting down the collector, and making a tarball copy of
the /opt/ibm/zimon directory (but the database corruption/data loss
still crept through for various reasons).</div><div><br></div><div><br></div>In
terms of debugging, we do not have Scale or zimon logs going back to
the suspected dates of data loss; we do have a gpfs.snap from about a
month after the last data loss - would it have any useful clues? Opening
a PMR could be tricky, as it was the customer who has the support
entitlement, and the environment (specifically the old cluster
definitino and the zimon collector VM) was torn down.<br clear="all"><div><br></div><div><br></div><div>Many Thanks,</div><div> Keith</div><font color="#888888"><div><br></div>-- <br>Keith D. Ball, PhD<br><div><div>RedLine Performance Solutions, LLC</div><div>web: <a href="http://www.redlineperf.com/" target="_blank">http://www.redlineperf.com/</a><br><div>email: <a href="mailto:aqualkenbush@redlineperf.com" target="_blank">kball@redlineperf.com</a></div></div></div>cell: <a href="tel:%28540%29%20557-7851" value="+15405577851" target="_blank">540-557-7851</a></font></div>
<br>______________________________<wbr>_________________<br>
gpfsug-discuss mailing list<br>
gpfsug-discuss at <a href="http://spectrumscale.org" rel="noreferrer" target="_blank">spectrumscale.org</a><br>
<a href="http://gpfsug.org/mailman/listinfo/gpfsug-discuss" rel="noreferrer" target="_blank">http://gpfsug.org/mailman/<wbr>listinfo/gpfsug-discuss</a><br>
<br></blockquote></div><br></div></div>