

This post has NOT been accepted by the mailing list yet.
Hello,
i am using this code to populate trend line for data usage on specific router interface. I have problem that I am not sure if the line shows correct prediction. It does not start at beginning of the graph but somehow 2 days later. Any comments if this implementation is correct would be appreciated.
Code:
DEF:dt_month=$RRD_FILE:dt_month:MAX \
VDEF:slope=dt_month,LSLSLOPE \
VDEF:int=dt_month,LSLINT \
CDEF:trend=dt_month,POP,COUNT,slope,*,int,+,0,INF,LIMIT \
LINE1:trend#00FF00:"Trend line \n" \
thanks


This post has NOT been accepted by the mailing list yet.
The limit you are putting on is clipping the trend at zero. It starts 2 days into the graph because that is when the trend crosses zero.
To test this, remove the limit
i.e. CDEF:trend=dt_month,POP,COUNT,slope,*,int,+
From: [hidden email] [via RRD Mailinglists] [mailto:mlnode+[hidden email]]
Sent: Wednesday, August 12, 2015 7:33 AM
To: Robert C. Seiwert <[hidden email]>
Subject: [GRAYMAIL] Question about trend line
Hello,
i am using this code to populate trend line for data usage on specific router interface. I have problem that I am not sure if the line shows correct prediction. It does not start at beginning of the graph but somehow 2 days later. Any comments if this implementation
is correct would be appreciated.
Code:
DEF:dt_month=$RRD_FILE:dt_month:MAX \
VDEF:slope=dt_month,LSLSLOPE \
VDEF:int=dt_month,LSLINT \
CDEF:trend=dt_month,POP,COUNT,slope,*,int,+,0,INF,LIMIT \
LINE1:trend#00FF00:"Trend line \n" \
thanks


Hello,
thanks for reply. i tried your modification >> CDEF:trend=dt_month,POP,COUNT,slope,*,int,+
however this has no effect. Also in first two days, the value was 70Mibts (each day had 35Mbytes) so there were non zero value in the graph since beginning. The thing is that it is not so well visible as scale not adequate here in the graph.


I’m sure you have other limits on rrdgraph probably set on the command line options that limits the graph range.
Just looking at your graph I can see that the trend would be less than zero at the beginning of the graph just by following the line shown.
You can verify this by printing out the SLOPE and INT. I would wager you will find INT (the starting y position on the graph) will be negative.
PRINT:slope:’%lf’
PRINT:int:’%lf’
And now another completely different point. You seem to be trying to predict disk usage. I have worked out a few tricks in this regard
You need to realize when you open your graph up you change your step size which will effect the trend data. The step size will be set to the time that fits into
one pixel. I have found thru trial and error that changing your step size will change the trend result.You can see your actual step size using the following
CDEF:c=dt_month,COUNT,EXC,POP
CDEF:t= dt_month,TIME,EXC,POP
VDEF:tmin=t,MINIMUM
VDEF:tmax=t,MAXIMUM
VDEF:cmax=c,MAXIMUM
CDEF:s=used2,POP,tmax,tmin,,cmax,1,,/
PRINT:s:MAX
S will be a range, there might be a better way to solve this but this is what I worked out, and value of S will be the step size in seconds. I actually use this
technique to produce trend values and step size to solve to trend when we will run out of disk space. This solves the problem of opening up the graph see the point where you run out of disk space. It also solves the problem if having to look at a graph to
see what I call the danger zone and gives you something that can actually trigger a warning. Example of Predict Disk script output is:
Reach 100% on Mon Oct 26 06:17:00 EDT 2015 Slope 110.40M per 60 seconds Correlation Coefficient 0.68
Data Analyzed Starting Mon Aug 10 12:24:00 EDT 2015  Ending Wed Aug 12 12:24:00 EDT 2015
That final graph is a bit off because last Friday I added a new disk array into a pool and then removed an old disk array from the pool to the overall space changed
in this time period.
Other insights into predicting disk usage can been seen in the link below. It is Nagios specific but the shell script to predict disk overflow is generic and
could be used with any system
https://www.youtube.com/watch?v=7TdVTfp3cAs
From: Ondrej [via RRD Mailinglists] [mailto:mlnode+[hidden email]]
Sent: Wednesday, August 12, 2015 11:46 AM
To: Robert C. Seiwert <[hidden email]>
Subject: [GRAYMAIL] RE: Question about trend line
Hello,
thanks for reply. i tried your modification >> CDEF:trend=dt_month,POP,COUNT,slope,*,int,+
however this has no effect. Also in first two days, the value was 70Mibts (each day had 35Mbytes) so there were non zero value in the graph since beginning. The thing is that it is not so well visible as scale not adequate here in the graph.


Thank you for comprehensive response. I am actually monitoring WAN0 interface data traffic, so problem is not related to disk space monitoring. I also watched the YT presentation and found this very useful. You are right, that my graph had limits set and when i removed them i got graph like this, having negative integer at first two days. Admitting i am not sure how your method step size method can remedy my problem, that trend line goes to negative area. can you explain further?


The Least Squares Line (LSL) in RRD just produces a line. Just a line. It doesn’t bend. It takes all the data points you have and calculates a line that best
approximates a path that passes through all data points. The relatively high values on the end of your graph gives your LSL trend a high slope and the negative values at the beginning. There are better ways to forecast usage than LSL but they also add complexity
into the mix.
LSLSLOPE, LSLINT, LSLCORREL (from http://oss.oetiker.ch/rrdtool/doc/rrdgraph_rpn.en.html)
Return the parameters for a Least Squares Line (y = mx +b) which approximate the provided dataset. LSLSLOPE is the slope (m) of the line related to the COUNT
position of the data. LSLINT is the yintercept (b), which happens also to be the first data point on the graph. LSLCORREL is the Correlation Coefficient (also known as Pearson's Product Moment Correlation Coefficient). It will range from 0 to +/1 and represents
the quality of fit for the approximation.
You are right, step is not related to your issue but step size will impact your results. Just a warning not to open your graph’s date range too far to avoid averaging/watering
down your data. If you use LSLCORREL (which I do) to validate the trend watering down your data will also inflate and produce an invalid correlation coefficient.
From: Ondrej [via RRD Mailinglists] [mailto:mlnode+[hidden email]]
Sent: Wednesday, August 12, 2015 2:22 PM
To: Robert C. Seiwert <[hidden email]>
Subject: [GRAYMAIL] RE: RE: Question about trend line
Thank you for comprehensive response. I am actually monitoring WAN0 interface data traffic, so problem is not related to disk space monitoring. I also watched the YT presentation and found this very useful.
You are right, that my graph had limits set and when i removed them i got graph like this, having negative integer at first two days. Admitting i am not sure how your method step size method can remedy my problem, that trend line goes to negative area. can
you explain further?


OK, this is clear how LSL should work, but since graph very beginning there as positive values (i double checked now using rrd dump) and there are never any negative values, so i'd still assume LSL will take these first two days into consideration. Now it looks to me that there are just ignored. In other words between 46 of august there are 5 minutes step always positive values.


Those initial values are not ignored and are used in the least squares calculation. The problem is that all the data is taken into consideration, the initial
few values near zero and the final values near 3000. To get to the 3000 the slope is great enough that the line needs to start in the basement. You can put the limit back on and it will stop at zero. Unless your slope is zero every line will eventually have
a negative y. When your trend is “downhill” you will have a negative slope and the values will go negative at some point in the future. Bogus data, such as when you first start monitoring, can also skew your trend. There are graphs that show the same trend
line but common sense will tell you LSL is only accurate on the first.
FYI with your date range you are using in the graph of 30 days on graph 1900 pixels large I’m guessing the step used in your calculations is more like 1 hour.
As your data looks like it is in flux right now. I would change your date range to trend only the last 48 hours and you will get a much different trend line.
On some of my graphs I use three trend lines that are hardcoded for immediate (1 day), longer (2 days), and even longer (a week) and they rarely agree. The link below shows how to hardcode this.
http://hints.jeb.be/2009/12/04/trendpredictionwithrrdtool/
BTW, if this is a WAN port shouldn’t the label be megabits/sec or some other rate and not total megabytes.
The graph below you can see that the bogus data from last week is skewing the trend down
The one week trend here is bogus due to an equipment change.
From: Ondrej [via RRD Mailinglists] [mailto:mlnode+[hidden email]]
Sent: Wednesday, August 12, 2015 3:52 PM
To: Robert C. Seiwert <[hidden email]>
Subject: [GRAYMAIL] RE: RE: Question about trend line
OK, this is clear how LSL should work, but since graph very beginning there as positive values (i double checked now using rrd dump) and there are never any negative values, so i'd still assume LSL will take
these first two days into consideration. Now it looks to me that there are just ignored. In other words between 46 of august there are 5 minutes step always positive values.


Thank you again for great explanation. My graph shows total consumed data on WAN0 interface, so not units/sec but total Megabytes that went over this interface in range of 30 days.
Reading all this about LSL* I think I should maybe reconsider using LSL functions here and find another way of predicting.
vnstat linux tool gives me prediction that if far different than LSL* ( 9.81 GiB currently is predicted whereas LSL gives almost 12GiB.)
Maybe using HRULE and the predicted value from vnstat would be a direction.


LSL trending is only good for trending linear data. I have trend lines on all my graphs by default and find them useful even when the data is not linear.
If the data you are trending is total MB in the last 30 days until you get over 30 days of data the trend will be invalid.
Vnstat is great and setting an hrule from the estimate is a great idea. If it were me I would still put a LSLTrend line on the graph just as an aid.
Original Message
From: rrdusers [mailto:rrdusersbounces+rob= [hidden email]] On Behalf Of Ondrej
Sent: Thursday, August 13, 2015 7:36 AM
To: [hidden email]
Subject: [GRAYMAIL] Re: [rrdusers] RE: RE: Question about trend line
Thank you again for great explanation. My graph shows total consumed data on
WAN0 interface, so not units/sec but total Megabytes that went over this interface in range of 30 days.
Reading all this about LSL* I think I should maybe reconsider using LSL functions here and find another way of predicting.
vnstat linux tool gives me prediction that if far different than LSL* ( 9.81 GiB currently is predicted whereas LSL gives almost 12GiB.)
Maybe using HRULE and the predicted value from vnstat would be a direction.

View this message in context: http://rrdmailinglists.937164.n2.nabble.com/Questionabouttrendlinetp7583077p7583087.htmlSent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.
_______________________________________________
rrdusers mailing list
[hidden email]
https://lists.oetiker.ch/cgibin/listinfo/rrdusers_______________________________________________
rrdusers mailing list
[hidden email]
https://lists.oetiker.ch/cgibin/listinfo/rrdusers


On 08/13/2015 09:07 AM, Robert C.
Seiwert wrote:
LSL trending is only good for trending linear data. I have trend lines on all my graphs by default and find them useful even when the data is not linear.
If the data you are trending is total MB in the last 30 days until you get over 30 days of data the trend will be invalid.
Vnstat is great and setting an hrule from the estimate is a great idea. If it were me I would still put a LSLTrend line on the graph just as an aid.
Does anyone know or understand the algorithm vnstat uses to do its
prediction?
Original Message
From: rrdusers [[hidden email]] On Behalf Of Ondrej
Sent: Thursday, August 13, 2015 7:36 AM
To: [hidden email]
Subject: [GRAYMAIL] Re: [rrdusers] RE: RE: Question about trend line
Thank you again for great explanation. My graph shows total consumed data on
WAN0 interface, so not units/sec but total Megabytes that went over this interface in range of 30 days.
Reading all this about LSL* I think I should maybe reconsider using LSL functions here and find another way of predicting.
vnstat linux tool gives me prediction that if far different than LSL* ( 9.81 GiB currently is predicted whereas LSL gives almost 12GiB.)
Maybe using HRULE and the predicted value from vnstat would be a direction.

View this message in context: http://rrdmailinglists.937164.n2.nabble.com/Questionabouttrendlinetp7583077p7583087.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.
_______________________________________________
rrdusers mailing list
[hidden email]
https://lists.oetiker.ch/cgibin/listinfo/rrdusers
_______________________________________________
rrdusers mailing list
[hidden email]
https://lists.oetiker.ch/cgibin/listinfo/rrdusers
_______________________________________________
rrdusers mailing list
[hidden email]
https://lists.oetiker.ch/cgibin/listinfo/rrdusers


I decided to have vnstat estimation line information in my graph and I am considering also to have LSL. in future i'll try to do some good combination.
but one more question. Is is possible to draw a skewed line in RRD? let say, the very beginning of the graph the line would start at zero and at the very end it would end at number that would reflect the trend. for users this quick look would give immediate awareness where there are, even the line would not perfectly copy true utilisation one.
i did not find the way how to draw skewed line.


Anything you can calculate you can draw. If you can define it as an equation you can put it in a cdef.
Of course it’s going to take some thought, understanding RPN, and a little trial and error.
At this point you probably would be off using one graphs generated by vnstat.
Not knowing the exact nature of your data I cannot say exactly what you want. I
It seems you want to draw a line from 0,0 through the last point of your data, this is prediction I think vnstat uses.
I almost can see the solution but I don’t want to work it out wrong here.
Something like (last value / secs in period of last value) * Step width then create a cdef that equals this value * count.
There may be a better way, if someone knows please enlighten me.
From: Ondrej [via RRD Mailinglists] [mailto:mlnode+[hidden email]]
Sent: Thursday, August 13, 2015 4:28 PM
To: Robert C. Seiwert <[hidden email]>
Subject: [GRAYMAIL] Re: Re: RE: RE: Question about trend line
I decided to have vnstat estimation line information in my graph and I am considering also to have LSL. in future i'll try to do some good combination.
but one more question. Is is possible to draw a skewed line in RRD? let say, the very beginning of the graph the line would start at zero and at the very end it would end at number that would reflect the trend. for users this quick look would give immediate
awareness where there are, even the line would not perfectly copy true utilisation one.
i did not find the way how to draw skewed line.


I somehow ended up with desired output and used your way for equation for getting a number a process it in CDEF to get the final line.
CDEF:trend=dt_month,POP,$COEF,COUNT,* \
$COEF is my final previously calculated number that is close to "24" when you print it, but I still have doubts about that number. How to get "step width" that you mentioned? Playing with numbers I found that "19500" as step width, gives proper line direction, but gives me no clue why 19500. This is rather match question but I would still appreciate your comment on that.
thanks


Wait I actually sent an email with some crazy calculation to get it. There is a vdef function called stepwidth but I think u need to be on a latest version of rrd
Sent from my Windows Phone
I somehow ended up with desired output and used your way for equation for getting a number a process it in CDEF to get the final line.
CDEF:trend=dt_month,POP,$COEF,COUNT,* \
$COEF is my final previously calculated number that is close to "24" when you print it, but I still have doubts about that number. How to get "step width" that you mentioned? Playing with numbers I found that "19500" as step width, gives proper line direction,
but gives me no clue why 19500. This is rather match question but I would still appreciate your comment on that.
thanks


Unfortunately i did not received any email from you. I tried to send you one and it ended up in my inbox only, not sure if it ended up at yours too.


I think I might have gotten bumped off the list as I don't see my replies. Ondrej, hope this reaches you.
While your data may have a step of 300 secs and you might even set the step size on the command line or even on the rrd datasource def this is not the step size used. The step used will be the number of seconds that can fit in one pixel. You can force the step by controlling the image resolution and the time range or just calculate the step knowing the image resolution and time range. In some of my scripts I force an image resolution of 4k so that step size can be 60 secs. This actually allows me to use the rate of change in a meaningful way. If my LSLSlope is 10Mb and my step is 60 that's 10Mb/min.
Inside RRD I calculate the stepwidth with the following.
CDEF:c=var1,COUNT,EXC,POP  The count of steps as a data set from 1 to end of chart
CDEF:t=var1,TIME,EXC,POP  time range of graph as a data set
VDEF:tmin=t,MINIMUM  starting time
VDEF:tmax=t,MAXIMUM  ending time
VDEF:cmax=c,MAXIMUM  count of steps plus one
CDEF:s=var1,POP,tmax,tmin,,cmax,1,,/  data set of stepwidth, calculated from (end time  start time) / (count 1)
VDEF:stepwith=s,MAXIMUM  stepwidth as a single value
Cannot do RPN on a VDEF so do the math into a dataset, then pull a value. Max or Min doesn't matter. Every value is the same.
Inside RRD 1.5.4 there is a function STEPWIDTH. The width of the current step in seconds. This is a fairly recent version. You can use this to get the step size with fewer calculations.
CDEF:step=var1,STEPWIDTH,EXC,POP
VDEF:stepw=step,MAXIMUM
Now if I could get some to tell me how to gprint this as Days:Hours:Mins:Secs. When I look at a one year graph I GPRINT the stepwidth on the graph but what is a step of 64.8k seconds? 7.5 days would be much more clear. That info combined with the slope of say 10GB tells me my rate of change is 10GB every week (roughly). Now if I could get seconds to scale like bytes do with SI units for time.

