пʼятниця, 21 червня 2013 р.

Ceres essentials

I think it would be nice to have an idea of Ceres storage format.

Frist, create a Ceres tree:

export CERES_TREE=/tmp/storage/ceres
ceres-tree-create --verbose $CERES_TREE

ls -la $CERES_TREE
drwxr-xr-x 2 tmp tmp 4096 Jun 20 22:38 .ceres-tree

There is just a directory created, nothing more.
This is a top-level directory, all Ceres nodes are created under this directory.

Create a new node ( graphite metric):

export NODE=test.item
ceres-node-create --tree $CERES_TREE --step 10 $NODE
,'--step 10' sets the time interval between two consecutive datapoints.
Node directory has been created, with '.ceres-node' file:

ls -la $CERES_TREE/test/item
-rw-rw-r-- 1 tmp tmp 16 Jun 20 22:48 /tmp/storage/ceres/test/item/.ceres-node

cat $CERES_TREE/test/item/.ceres-node
{"timeStep": 10}
'.ceres-node' is a special file, it stores node metadata, in a JSON-format string.
(In Whisper, metadata are stored at very beginning of .wsp file).

By default, it contains only timeStep value that is required for Ceres to read/write datapoints correctly, but it can be any valid json string, for example:

{"timeStep": 10, "location":"earth", "saturn":["no","more","drinks"]}
, or (megacarbon real examle):
{"timeStep": 10, "retentions": [[10, 8640], [60, 43200], [300, 105120]], "xFilesFactor": 0.5, "aggregationMethod": "average"}
If we know a node name, we can check whether it exists within out Ceres tree (wildcards allowed):

$ ceres-tree-find $CERES_TREE ${NODE}
, usually we may want to get the filesystem path instead of node path:

$ ceres-tree-find $CERES_TREE ${NODE} --fspath
OK, we created a node, but it is empty, it has no data. It's time to write some datapoints.

$ ceres-node-write --tree $CERES_TREE $NODE N:1
Each datapoint is of the form <timestamp>:<value> where <timestamp> may be a UNIX epoch time or the character 'N' to indicate 'now'. An actual datapoint (1371759490, 1.0) has been stored. We put integer 1, but Ceres always stores floats, so it was internally converted to 1.0. Now take a look at the new file created:
ls -l `ceres-tree-find $CERES_TREE ${NODE} --fspath`
-rw-r--r-- 1 tmp tmp 8 Jun 20 23:18 1371759490@10.slice
This is a slice file. Ceres stores node datapoints in so called 'slices'.
There may be one slice, or hundreds or even thousands of slices, it depends mostly on how often new datapoints are stored and node timeStep (it is internal Ceres kitchen and we generally should not worry about it).
Note the slice size - it is 8 bytes long. That's it, every datapoint is float and it takes 8 bytes to store one datapoint.
Ceres does not store timestamps, it stores only data values, one value per every 'timeStep' period.

-rw-r--r-- 1 tmp tmp 8 Jun 20 23:18 1371759490@10.slice
1371759490 - this is a timestamp of the first datapoint in slice,
@10 - this is time step interval for this slice. When a new slice is created, it is created with 'timeStep' value specified in '.ceres-node' matadata file.

8 - file size
Knowing the slice file size, we can always tell how many datapoints it contains, and knowing the 'timeStep' we can always tell what period of data are stored in this slice for.
A rough formula for this:
            (end_time = start_time + (getsize(file)/DATAPOINT_SIZE)*timeStep)

Go on, add more datapoints:
ceres-node-write --tree $CERES_TREE $NODE N:2.0
ceres-node-write --tree $CERES_TREE $NODE N:3.0
ceres-node-write --tree $CERES_TREE $NODE N:4.5
ceres-node-write --tree $CERES_TREE $NODE N:234234.0
ceres-node-write --tree $CERES_TREE $NODE
ceres-node-write --tree $CERES_TREE $NODE
ceres-node-write --tree $CERES_TREE $NODE N:333
, and check file size:
ls -l `ceres-tree-find $CERES_TREE ${NODE} --fspath`
-rw-r--r-- 1 tmp tmp  8 Jun 20 23:18 1371759490@10.slice
-rw-r--r-- 1 tmp tmp 48 Jun 20 23:25 1371759880@10.slice
Wow, we've got two slice files now, instead of one. Why ?
That's because some time had passed since we wrote our first datapoint.
In order words, a time gap between last written datapoint and a new one was quite enough to Ceres to decide to start new slice instead of continue writing to previous slice. If this time gap was less than some  'maxSliceGap' interval, then Ceres would first write a series of None for missing datapoints (to align timestamp position) to the previous slice  and then write a new value.

Although Ceres was designed with irregular updates in mind to store only actual values, this is a trade-off between slice file size and a number of slices created.

Also, we see that we submitted 7 new datapoints, but only 6 were actually saved (48/8).

We want to see values node data stored:
echo $((48/8*10))

echo $((1371759880+48/8*10))
ceres-node-read --tree=$CERES_TREE $NODE --fromtime=1371759880 --untiltime=$((1371759880+48/8*10))
Thu Jun 20 23:24:50 2013 2.0
Thu Jun 20 23:25:00 2013 4.5
Thu Jun 20 23:25:10 2013 234234.0
Thu Jun 20 23:25:20 2013 None
Thu Jun 20 23:25:30 2013 333.0
Thu Jun 20 23:25:40 2013 None
We see that some datapoints (3.0 and one of the 'None') are missing.
The reason is that node timeStep is 10 seconds, and I have been submitting some values too often.
Value '3.0' for 'Thu Jun 20 23:25:00 2013' was submitted but was overwritten by '4.5', because both were submitted within that 10 sec interval, and the latter value was stored. The same has happened to either 'None' or '333.00'.

To view node datapoints for some relative period (say, 5 last minutes), I use:
ceres-node-read --tree=$CERES_TREE $NODE --fromtime=`date -d'5 minute ago' +%s` --untiltime=`date -d'now' +%s`
1371764460 None
1371764540 None
Note, that here we have no idea whether the datapoints are read from one slice or from many slices.
Also, looking at the series of 'None' values we can't tell whether they are slice values or they are just gaps between different slices. It is all covered by Ceres "behind-the-scene".

A handy case, if we want to find out nodes that really store datapoints for a specific time interval:
ceres-tree-find $CERES_TREE ${NODE} --fspath --fromtime=`date -d'5 minute ago' +%s` --untiltime=`date -d'now' +%s`

ceres-tree-find $CERES_TREE ${NODE} --fspath --fromtime=`date -d'30 minute ago' +%s` --untiltime=`date -d'now' +%s`

Немає коментарів: