Bug #2565

Graph not dealing well with integer values

Added by Cezar Gasca over 2 years ago. Updated over 2 years ago.

Status:NewStart date:07/18/2011
Priority:NormalDue date:
Assignee:Julien Mathis% Done:

0%

Category:CentStorage daemon
Target version:Centreon-2.2.3
Resolution:

Description

Hi guys,

Recently a have discovered a weird behavior from the graph point of view. When trying to monitor integer parameters
(like the number of users connected to an application or the number of used B channels from a T1/E1 line) the result is not
satisfying at all. In this case, even though the plugin is always returning integer values as performance data, the values
graphed are decimal fractions and very different than the ones returned by the plugin making it completely useless(Having
3.4 users connected or 5.7 b-channels used - as last value - makes no sense.).
So, i have started my own investigation. It seems that the information that is stored in mysql is different than the
information stored in rrd. Please see example below.
-------------------------------------------------------------------
mysql> select * from data_bin where id_metric='293' order by ctime;
...............
: 293 : 1310964693 : 0 : 0 :
: 293 : 1310964993 : 1 : 0 :
: 293 : 1310965293 : 0 : 0 :
: 293 : 1310965595 : 0 : 0 :
: 293 : 1310965895 : 0 : 0 :
: 293 : 1310966195 : 2 : 0 :
: 293 : 1310966497 : 1 : 0 :
: 293 : 1310966797 : 2 : 0 :
: 293 : 1310967097 : 1 : 0 :
: 293 : 1310967397 : 0 : 0 :
: 293 : 1310967697 : 0 : 0 :
: 293 : 1310967997 : 3 : 0 :
---------------------+-------+--------+

[root@test metrics]# rrdtool fetch 293.rrd AVERAGE
................
1310964900: 2.0000000000e-01
1310965200: 4.0000000000e-01
1310965500: 0.0000000000e+00
1310965800: 0.0000000000e+00
1310966100: 1.2000000000e+00
1310966400: 1.6000000000e+00
1310966700: 1.2000000000e+00
1310967000: 1.4000000000e+00
1310967300: 6.0000000000e-01
1310967600: 0.0000000000e+00
1310967900: 2.4000000000e+00
1310968200: 2.4000000000e+00
-------------------------------------------------------------------

At the beginning I thought that maybe there is a problem with the way the rrd files are created. After some time I
understood that the issue resides in the rrd update process. The "problem" is that the rrd files are updated using the
timestamp of when the data was gathered.
So, at this moment, if you compare mysql data with data stored in rrd, you have different timestamp and different values.
So, as an workaround and a compromise i think a good idea would be to update the rrd file using the next rounded fixed
interval. Just to explain a little bit more, instead of updating the rrd file using the timestamp 1310967997 we can update
the rrd file using the timestamp 1310968200 (which is the next fixed multiplier of step); And now, you have different
timestamps and same values as in mysql and the graphs are useful again.

History

#1 Updated by Tensibai - Bastien Jove over 2 years ago

If you look closer you'll find rrd timestamps are correct (in term of interval I mean, every 300 sec so 5 mins). And they are on a 5 mins basis (14h00, 14h05, etc.)

Thoose in Mysql are the real check end timestamps and so not correlated to times.

That's the way rrdtool works, it does a consolidation of values in an average manner, so when you update the rrd DB it does an average to covers the nearest next period (and if too close of last period update it).

I don't really see a correct manner to do this as the rounding is done by rrdtool at the moment of writing, having correct timestamps in mysql could do the trick maybe, but it involves knowing the check interval to create the correct timestamp and as far as I know rrd won't accept a future timestamp, so it involves centstorage process to keep trace before writing.

Actually this is not done because it is not necessary to insert values into rrd, the rrd library does the trick and sometimes give values not so accurate but near reallity.

Last point: Doing a check every 5 mins for a current counter involves missing some high or low values occuring in this period, everything between the two checks is unknow and won't even be known, that's why rrd work with average consolidation and never with accurate values.

#2 Updated by Cezar Gasca over 2 years ago

Hello Bastien,

First, thanks for taking the time to look at this.
I didn't stated at any time that the rrd timestamps are not correct. I also know that timestamps from
mysql are the real check end timestamps and i also know that when updating the rrd with real check end
timestamps rrd will do a consolidation of data in an average manner and will display data at fixed
intervals. By the way, it is possible to update an rrd file with future timestamps. RRDTool only makes sure
that the timestamp for the new value is higher than the last update time. (Please see my example below where
I was able to update an rrd file with future timestamps).
Regarding your last point, indeed, doing a check every 5 minutes for monitoring a current counter is not
the best practice because you can have a lot of spikes between these 2 periods, but doing this at smaller
interval (30 sec.) gives you the chance to have a better overview of the trending. For example, if you want
to monitor the used B channels from a T1/E1 doing checks at 30 sec. can give you some idea of the load
(/periods of the day) because a call is lasting in average more than 30 sec. And, I can give you examples
like this one where it is useful to monitor integer values.

[root@test ~]# date +%s
1311232883

[root@test ~]# rrdtool info 298.rrd
filename = "298.rrd"
rrd_version = "0003"
step = 600
last_update = 1311232800
......

[root@test ~]# rrdtool fetch 298.rrd AVERAGE
........
1311232200: 1.5000000000e+01
1311232800: 2.2000000000e+01
1311233400: nan

[root@test ~]# rrdtool update 298.rrd 1311233400:85 1311234000:30

[root@test ~]# rrdtool fetch 298.rrd AVERAGE --start 1311232200 --end 1311234000

1311232800: 2.2000000000e+01
1311233400: 8.5000000000e+01
1311234000: 3.0000000000e+01
1311234600: nan

[root@test ~]# date +%s
1311233219

[root@test ~]# rrdtool info 298.rrd
filename = "298.rrd"
rrd_version = "0003"
step = 600
last_update = 1311234000
......

#3 Updated by Tensibai - Bastien Jove over 2 years ago

RRD minimum update time is 60 secs (that's why centreon use *60 in configuration) and that doing something more often would create a big network and server overhead on the monitoring target.

The main point I was talking about is : you must know the check interval to calculate the next correct rrd time.
This is not trivial and add an overhead as actually this is not needed to update an rrd DB. (see man rrdudapte)

http://forums.cacti.net/about33897.html this could help understand the 'problem'.

In my point of view this is not a bug.
Your graphs are not unusable, they're just 'near' the real values, you may round them yourself while viewing.

Maybe someone has an idea on how to do a workaround/enhancement with a very little overhead, but that's not me, sorry ;)

#4 Updated by Julien Mathis over 2 years ago

  • Target version changed from Centreon 2.2.2 to Centreon-2.2.3

Also available in: Atom PDF