Question about trend line

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about trend line

Ondrej
This post has NOT been accepted by the mailing list yet.
Hello,

i am using this code to populate trend line for data usage on specific router interface. I have problem that I am not sure if the line shows correct prediction. It does not start at beginning of the graph but somehow 2 days later. Any comments if this implementation is correct would be appreciated.

Code:

DEF:dt_month=$RRD_FILE:dt_month:MAX \                                            
VDEF:slope=dt_month,LSLSLOPE \                          
VDEF:int=dt_month,LSLINT \                                                          
CDEF:trend=dt_month,POP,COUNT,slope,*,int,+,0,INF,LIMIT \    
LINE1:trend#00FF00:"Trend line \n" \    




thanks
Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] Question about trend line

rseiwert
This post has NOT been accepted by the mailing list yet.

The limit you are putting on is clipping the trend at zero. It starts 2 days into the graph because that is when the trend crosses zero.

 

To test this, remove the limit

i.e.  CDEF:trend=dt_month,POP,COUNT,slope,*,int,+

 

From: [hidden email] [via RRD Mailinglists] [mailto:ml-node+[hidden email]]
Sent: Wednesday, August 12, 2015 7:33 AM
To: Robert C. Seiwert <[hidden email]>
Subject: [GRAYMAIL] Question about trend line

 

Hello,

i am using this code to populate trend line for data usage on specific router interface. I have problem that I am not sure if the line shows correct prediction. It does not start at beginning of the graph but somehow 2 days later. Any comments if this implementation is correct would be appreciated.

Code:

DEF:dt_month=$RRD_FILE:dt_month:MAX \                                            
VDEF:slope=dt_month,LSLSLOPE \                          
VDEF:int=dt_month,LSLINT \                                                          
CDEF:trend=dt_month,POP,COUNT,slope,*,int,+,0,INF,LIMIT \    
LINE1:trend#00FF00:"Trend line \n" \    


http://rrd-mailinglists.937164.n2.nabble.com/file/n7583077/trend_rrd.png

thanks


If you reply to this email, your message will be added to the discussion below:

http://rrd-mailinglists.937164.n2.nabble.com/Question-about-trend-line-tp7583077.html

To start a new topic under RRDtool Users Mailinglist, email [hidden email]
To unsubscribe from RRD Mailinglists, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] Question about trend line

Ondrej
Hello,

thanks for reply. i tried your modification >> CDEF:trend=dt_month,POP,COUNT,slope,*,int,+

however this has no effect. Also in first two days, the value was 70Mibts (each day had 35Mbytes) so there were non zero value in the graph since beginning. The thing is that it is not so well visible as scale not adequate here in the graph.
Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] RE: Question about trend line

rseiwert

I’m sure you have other limits on rrdgraph probably set on the command line options that limits the graph range.

Just looking at your graph I can see that the trend would be less than zero at the beginning of the graph just by following the line shown.

You can verify this by printing out the SLOPE and INT. I would wager you will find INT (the starting y position on the graph) will be negative.

PRINT:slope:’%lf’

PRINT:int:’%lf’

 

And now another completely different point. You seem to be trying to predict disk usage. I have worked out a few tricks in this regard

 

You need to realize when you open your graph up you change your step size which will effect the trend data. The step size will be set to the time that fits into one pixel. I have found thru trial and error that changing your step size will change the trend result.You can see your actual step size using the following

 

CDEF:c=dt_month,COUNT,EXC,POP

CDEF:t= dt_month,TIME,EXC,POP

VDEF:tmin=t,MINIMUM

VDEF:tmax=t,MAXIMUM

VDEF:cmax=c,MAXIMUM

CDEF:s=used2,POP,tmax,tmin,-,cmax,1,-,/

PRINT:s:MAX

 

S will be a range, there might be a better way to solve this but this is what I worked out, and value of S will be the step size in seconds. I actually use this technique to produce trend values and step size to solve to trend when we will run out of disk space. This solves the problem of opening up the graph see the point where you run out of disk space. It also solves the problem if having to look at a graph to see what I call the danger zone and gives you something that can actually trigger a warning. Example of Predict Disk script output is:  

Reach 100% on Mon Oct 26 06:17:00 EDT 2015 Slope 110.40M per 60 seconds Correlation Coefficient 0.68

Data Analyzed Starting Mon Aug 10 12:24:00 EDT 2015 - Ending Wed Aug 12 12:24:00 EDT 2015

 

http://nagios.vcaglobal.com/nagiosxi/includes/components/perfdata/graphApi.php?host=iSCSIgroup&service=Disk_Usage&source=2&view=2&start=&end=&rand=1439396239

http://nagios.vcaglobal.com/nagiosxi/includes/components/perfdata/graphApi.php?host=iSCSIgroup&service=Predict_Disk&source=1&view=2&start=&end=&rand=1439396950

http://nagios.vcaglobal.com/nagiosxi/includes/components/perfdata/graphApi.php?host=iSCSIgroup&service=Disk_Usage&source=1&view=1&start=&end=&rand=1439397117

 

That final graph is a bit off because last Friday I added a new disk array into a pool and then removed an old disk array from the pool to the overall space changed in this time period.

Other insights into predicting disk usage can been seen in the link below. It is Nagios specific but the shell script to predict disk overflow is generic and could be used with any system

https://www.youtube.com/watch?v=7TdVTfp3cAs

 

 

From: Ondrej [via RRD Mailinglists] [mailto:ml-node+[hidden email]]
Sent: Wednesday, August 12, 2015 11:46 AM
To: Robert C. Seiwert <[hidden email]>
Subject: [GRAYMAIL] RE: Question about trend line

 

Hello,

thanks for reply. i tried your modification >> CDEF:trend=dt_month,POP,COUNT,slope,*,int,+

however this has no effect. Also in first two days, the value was 70Mibts (each day had 35Mbytes) so there were non zero value in the graph since beginning. The thing is that it is not so well visible as scale not adequate here in the graph.


If you reply to this email, your message will be added to the discussion below:

http://rrd-mailinglists.937164.n2.nabble.com/Question-about-trend-line-tp7583077p7583079.html

To start a new topic under RRDtool Users Mailinglist, email [hidden email]
To unsubscribe from RRD Mailinglists, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] RE: Question about trend line

Ondrej
Thank you for comprehensive response. I am actually monitoring WAN0 interface data traffic, so problem is not related to disk space monitoring. I also watched the YT presentation and found this very useful. You are right, that my graph had limits set and when i removed them i got graph like this, having negative integer at first two days. Admitting i am not sure how your method step size method can remedy my problem, that trend line goes to negative area. can you explain further?

Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] RE: RE: Question about trend line

rseiwert

The Least Squares Line (LSL) in RRD just produces a line. Just a line. It doesn’t bend. It takes all the data points you have and calculates a line that best approximates a path that passes through all data points. The relatively high values on the end of your graph gives your LSL trend a high slope and the negative values at the beginning. There are better ways to forecast usage than LSL but they also add complexity into the mix.

 

LSLSLOPE, LSLINT, LSLCORREL (from http://oss.oetiker.ch/rrdtool/doc/rrdgraph_rpn.en.html)

Return the parameters for a Least Squares Line (y = mx +b) which approximate the provided dataset. LSLSLOPE is the slope (m) of the line related to the COUNT position of the data. LSLINT is the y-intercept (b), which happens also to be the first data point on the graph. LSLCORREL is the Correlation Coefficient (also known as Pearson's Product Moment Correlation Coefficient). It will range from 0 to +/-1 and represents the quality of fit for the approximation.

 

You are right, step is not related to your issue but step size will impact your results. Just a warning not to open your graph’s date range too far to avoid averaging/watering down your data. If you use LSLCORREL (which I do) to validate the trend watering down your data will also inflate and produce an invalid correlation coefficient.

 

From: Ondrej [via RRD Mailinglists] [mailto:ml-node+[hidden email]]
Sent: Wednesday, August 12, 2015 2:22 PM
To: Robert C. Seiwert <[hidden email]>
Subject: [GRAYMAIL] RE: RE: Question about trend line

 

Thank you for comprehensive response. I am actually monitoring WAN0 interface data traffic, so problem is not related to disk space monitoring. I also watched the YT presentation and found this very useful. You are right, that my graph had limits set and when i removed them i got graph like this, having negative integer at first two days. Admitting i am not sure how your method step size method can remedy my problem, that trend line goes to negative area. can you explain further?


If you reply to this email, your message will be added to the discussion below:

http://rrd-mailinglists.937164.n2.nabble.com/Question-about-trend-line-tp7583077p7583083.html

To start a new topic under RRDtool Users Mailinglist, email [hidden email]
To unsubscribe from RRD Mailinglists, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] RE: Question about trend line

Ondrej
In reply to this post by Ondrej
OK, this is clear how LSL should work, but since graph very beginning there as positive values (i double checked now using rrd dump) and there are never any negative values, so i'd still assume LSL will take these first two days into consideration. Now it looks to me that there are just ignored. In other words between 4-6 of august there are 5 minutes step always positive values.
Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] RE: RE: Question about trend line

rseiwert

Those initial values are not ignored and are used in the least squares calculation. The problem is that all the data is taken into consideration, the initial few values near zero and the final values near 3000. To get to the 3000 the slope is great enough that the line needs to start in the basement. You can put the limit back on and it will stop at zero. Unless your slope is zero every line will eventually have a negative y. When your trend is “downhill” you will have a negative slope and the values will go negative at some point in the future. Bogus data, such as when you first start monitoring, can also skew your trend. There are graphs that show the same trend line but common sense will tell you LSL is only accurate on the first.

http://reference.wolfram.com/applications/eda/NBMLImages/FittingDataToLinearModelsByLeast-SquaresTechniques/FittingDataToLinearModelsByLeast-SquaresTechniques_54.gif

FYI with your date range you are using in the graph of 30 days on graph 1900 pixels large I’m guessing the step used in your calculations is more like 1 hour.

 

As your data looks like it is in flux right now. I would change your date range to trend only the last 48 hours and you will get a much different trend line. On some of my graphs I use three trend lines that are hardcoded for immediate (1 day), longer (2 days), and even longer (a week) and they rarely agree. The link below shows how to hardcode this.

 

http://hints.jeb.be/2009/12/04/trend-prediction-with-rrdtool/

 

BTW, if this is a WAN port shouldn’t the label be megabits/sec or some other rate and not total megabytes.

The graph below you can see that the bogus data from last week is skewing the trend down

 

The one week trend here is bogus due to an equipment change.

 

From: Ondrej [via RRD Mailinglists] [mailto:ml-node+[hidden email]]
Sent: Wednesday, August 12, 2015 3:52 PM
To: Robert C. Seiwert <[hidden email]>
Subject: [GRAYMAIL] RE: RE: Question about trend line

 

OK, this is clear how LSL should work, but since graph very beginning there as positive values (i double checked now using rrd dump) and there are never any negative values, so i'd still assume LSL will take these first two days into consideration. Now it looks to me that there are just ignored. In other words between 4-6 of august there are 5 minutes step always positive values.


If you reply to this email, your message will be added to the discussion below:

http://rrd-mailinglists.937164.n2.nabble.com/Question-about-trend-line-tp7583077p7583085.html

To start a new topic under RRDtool Users Mailinglist, email [hidden email]
To unsubscribe from RRD Mailinglists, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] RE: RE: Question about trend line

Ondrej
Thank you again for great explanation. My graph shows total consumed data on WAN0 interface, so not units/sec but total Megabytes that went over this interface in range of 30 days.

Reading all this about LSL* I think I should maybe reconsider using LSL functions here and find another way of predicting.

vnstat linux tool gives me prediction that if far different than LSL* ( 9.81 GiB currently is predicted whereas LSL gives almost 12GiB.)

Maybe using HRULE and the predicted value from vnstat would be a direction.

Reply | Threaded
Open this post in threaded view
|

Re: [GRAYMAIL] Re: RE: RE: Question about trend line

rseiwert
LSL trending is only good for trending linear data. I have trend lines on all my graphs by default and find them useful even when the data is not linear.
If the data you are trending is total MB in the last 30 days until you get over 30 days of data the trend will be invalid.

Vnstat is great and setting an hrule from the estimate is a great idea. If it were me I would still put a LSLTrend line on the graph just as an aid.

-----Original Message-----
From: rrd-users [mailto:rrd-users-bounces+rob=[hidden email]] On Behalf Of Ondrej
Sent: Thursday, August 13, 2015 7:36 AM
To: [hidden email]
Subject: [GRAYMAIL] Re: [rrd-users] RE: RE: Question about trend line

Thank you again for great explanation. My graph shows total consumed data on
WAN0 interface, so not units/sec but total Megabytes that went over this interface in range of 30 days.

Reading all this about LSL* I think I should maybe reconsider using LSL functions here and find another way of predicting.

vnstat linux tool gives me prediction that if far different than LSL* ( 9.81 GiB currently is predicted whereas LSL gives almost 12GiB.)

Maybe using HRULE and the predicted value from vnstat would be a direction.





--
View this message in context: http://rrd-mailinglists.937164.n2.nabble.com/Question-about-trend-line-tp7583077p7583087.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.

_______________________________________________
rrd-users mailing list
[hidden email]
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users

_______________________________________________
rrd-users mailing list
[hidden email]
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
Reply | Threaded
Open this post in threaded view
|

Re: [GRAYMAIL] Re: RE: RE: Question about trend line

Steve Clark
On 08/13/2015 09:07 AM, Robert C. Seiwert wrote:
LSL trending is only good for trending linear data. I have trend lines on all my graphs by default and find them useful even when the data is not linear. 
If the data you are trending is total MB in the last 30 days until you get over 30 days of data the trend will be invalid. 

Vnstat is great and setting an hrule from the estimate is a great idea. If it were me I would still put a LSLTrend line on the graph just as an aid. 
Does anyone know or understand the algorithm vnstat uses to do its prediction?

-----Original Message-----
From: rrd-users [[hidden email]] On Behalf Of Ondrej
Sent: Thursday, August 13, 2015 7:36 AM
To: [hidden email]
Subject: [GRAYMAIL] Re: [rrd-users] RE: RE: Question about trend line

Thank you again for great explanation. My graph shows total consumed data on
WAN0 interface, so not units/sec but total Megabytes that went over this interface in range of 30 days.

Reading all this about LSL* I think I should maybe reconsider using LSL functions here and find another way of predicting. 

vnstat linux tool gives me prediction that if far different than LSL* ( 9.81 GiB currently is predicted whereas LSL gives almost 12GiB.)

Maybe using HRULE and the predicted value from vnstat would be a direction.





--
View this message in context: http://rrd-mailinglists.937164.n2.nabble.com/Question-about-trend-line-tp7583077p7583087.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.

_______________________________________________
rrd-users mailing list
[hidden email]
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users

_______________________________________________
rrd-users mailing list
[hidden email]
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users



--
Stephen Clark
NetWolves Managed Services, LLC.
Director of Technology
Phone: 813-579-3200
Fax: 813-882-0209
Email: [hidden email]
http://www.netwolves.com

_______________________________________________
rrd-users mailing list
[hidden email]
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
Reply | Threaded
Open this post in threaded view
|

Re: [GRAYMAIL] Re: RE: RE: Question about trend line

Ondrej
I decided to have vnstat estimation line information in my graph and I am considering also to have LSL. in future i'll try to do some good combination.

but one more question. Is is possible to draw a skewed line in RRD? let say, the very beginning of the graph the line would start at zero and at the very end it would end at number that would reflect the trend. for users this quick look would give immediate awareness where there are, even the line would not perfectly copy true utilisation one.

i did not find the way how to draw skewed line.
Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] Re: Re: RE: RE: Question about trend line

rseiwert

Anything you can calculate you can draw. If you can define it as an equation you can put it in a cdef.

Of course it’s going to take some thought, understanding RPN, and a little trial and error.

At this point you probably would be off using one graphs generated by vnstat.

Not knowing the exact nature of your data I cannot say exactly what you want. I

It seems you want to draw a line from 0,0 through the last point of your data, this is prediction I think vnstat uses.

I almost can see the solution but I don’t want to work it out wrong here.

Something like (last value / secs in period of last value) * Step width then create a cdef that equals this value * count.

There may be a better way, if someone knows please enlighten me.

 

From: Ondrej [via RRD Mailinglists] [mailto:ml-node+[hidden email]]
Sent: Thursday, August 13, 2015 4:28 PM
To: Robert C. Seiwert <[hidden email]>
Subject: [GRAYMAIL] Re: Re: RE: RE: Question about trend line

 

I decided to have vnstat estimation line information in my graph and I am considering also to have LSL. in future i'll try to do some good combination.

but one more question. Is is possible to draw a skewed line in RRD? let say, the very beginning of the graph the line would start at zero and at the very end it would end at number that would reflect the trend. for users this quick look would give immediate awareness where there are, even the line would not perfectly copy true utilisation one.

i did not find the way how to draw skewed line.


If you reply to this email, your message will be added to the discussion below:

http://rrd-mailinglists.937164.n2.nabble.com/Question-about-trend-line-tp7583077p7583094.html

To start a new topic under RRDtool Users Mailinglist, email [hidden email]
To unsubscribe from RRD Mailinglists, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] Re: Re: RE: RE: Question about trend line

Ondrej
I somehow ended up with desired output and used your way for equation for getting a number a process it in CDEF to get the final line.

CDEF:trend=dt_month,POP,$COEF,COUNT,* \

$COEF is my final previously calculated number that is close to "24" when you print it, but I still have doubts about that number.  How to get "step width" that you mentioned? Playing with numbers I found that "19500" as step width, gives proper line direction, but gives me no clue why 19500. This is rather match question but I would still appreciate your comment on that.
thanks

Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] RE: Re: Re: RE: RE: Question about trend line

rseiwert
Wait I actually sent an email with some crazy calculation to get it. There is a vdef function called stepwidth but I think u need to be on a latest version of rrd

Sent from my Windows Phone

From: [hidden email]
Sent: ‎8/‎14/‎2015 5:06 PM
To: [hidden email]
Subject: [GRAYMAIL] RE: Re: Re: RE: RE: Question about trend line

I somehow ended up with desired output and used your way for equation for getting a number a process it in CDEF to get the final line.

CDEF:trend=dt_month,POP,$COEF,COUNT,* \

$COEF is my final previously calculated number that is close to "24" when you print it, but I still have doubts about that number.  How to get "step width" that you mentioned? Playing with numbers I found that "19500" as step width, gives proper line direction, but gives me no clue why 19500. This is rather match question but I would still appreciate your comment on that.
thanks




If you reply to this email, your message will be added to the discussion below:
http://rrd-mailinglists.937164.n2.nabble.com/Question-about-trend-line-tp7583077p7583100.html
To start a new topic under RRDtool Users Mailinglist, email ml-node+[hidden email]
To unsubscribe from RRD Mailinglists, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] RE: Re: Re: RE: RE: Question about trend line

Ondrej
Unfortunately i did not received any email from you. I tried to send you one and it ended up in my inbox only, not sure if it ended up at yours too.
Reply | Threaded
Open this post in threaded view
|

RE: [GRAYMAIL] Re: Re: RE: RE: Question about trend line

rseiwert
In reply to this post by Ondrej
I think I might have gotten bumped off the list as I don't see my replies. Ondrej, hope this reaches you.
While your data may have a step of 300 secs and you might even set the step size on the command line or even on the  rrd datasource def this is not the step size used. The step used will be the number of seconds that can fit in one pixel. You can force the step by controlling the image resolution and the time range or just calculate the step knowing the image resolution and time range. In some of my scripts I force an image resolution of 4k so that step size can be 60 secs. This actually allows me to use the rate of change in a meaningful way. If my LSLSlope is 10Mb and my step is 60 that's 10Mb/min.

Inside RRD I calculate the stepwidth with the following.

CDEF:c=var1,COUNT,EXC,POP       --- The count of steps as a data set from 1 to end of chart
CDEF:t=var1,TIME,EXC,POP           --- time range of graph as a data set
VDEF:tmin=t,MINIMUM                --- starting time
VDEF:tmax=t,MAXIMUM              --- ending time
VDEF:cmax=c,MAXIMUM              --- count of steps plus one
CDEF:s=var1,POP,tmax,tmin,-,cmax,1,-,/   --- data set of stepwidth, calculated from (end time - start time) / (count -1)
VDEF:stepwith=s,MAXIMUM     --- stepwidth as a single value

Cannot do RPN on a VDEF so do the math into a dataset, then pull a value. Max or Min doesn't matter. Every value is the same.

Inside RRD 1.5.4 there is a function  STEPWIDTH. The width of the current step in seconds. This is a fairly recent version. You can use this to get the step size with fewer calculations.
CDEF:step=var1,STEPWIDTH,EXC,POP
VDEF:stepw=step,MAXIMUM

Now if I could get some to tell me how to gprint this as Days:Hours:Mins:Secs. When I look at a one year graph I GPRINT the stepwidth on the graph but what is a step of 64.8k seconds? 7.5 days would be much more clear. That info combined with the slope of say 10GB tells me my rate of change is 10GB every week (roughly). Now if I could get seconds to scale like bytes do with SI units for time.