Notes on rrdb

Usage

The three function in the packages allow for the creation and manipulation of the database file.

First load the library and create some test data

library(rrdb)
library(xts)
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric

ts <- seq( as.POSIXct("1990-01-01",tz="UTC"),as.POSIXct("1990-02-01",tz="UTC"),by=900 )
D <- xts( 1:length(ts), order.by=ts )

Note that time step of the data is 900 seconds and that since there is potentially an observation at 1970-01-01 00:00:00 there is no offset on the origin of the timestep sequence.

First let us create a database just smaller then the data

## use a tempory file
fn <- tempfile()
on.exit( unlink(fn) )

tz_step <- 900
n <- nrow(D)-20
create_db(fn,tz_step,n,ncol(D))

Next write the first n data points to the database

update_db(fn,D[1:n,])

To read from the database we supply start and end times for the data. All data in this period will be returned as an xts object.

head( read_db(fn,index(D)[10],index(D)[n-5]) )
#> Warning: object timezone ('UTC') is different from system timezone ('')
#>   NOTE: set 'options(xts_check_TZ = FALSE)' to disable this warning
#>     This note is displayed once per session
#>                     [,1]
#> 1990-01-01 02:15:00   10
#> 1990-01-01 02:30:00   11
#> 1990-01-01 02:45:00   12
#> 1990-01-01 03:00:00   13
#> 1990-01-01 03:15:00   14
#> 1990-01-01 03:30:00   15

Updating the database with more data then it can hold will result in the most recent data being stored.

update_db(fn,D)
head( read_db(fn,index(D)[n],index(D)[nrow(D)]) )
#> Warning: object timezone ('UTC') is different from system timezone ('')
#>                     [,1]
#> 1990-01-31 19:00:00 2957
#> 1990-01-31 19:15:00 2958
#> 1990-01-31 19:30:00 2959
#> 1990-01-31 19:45:00 2960
#> 1990-01-31 20:00:00 2961
#> 1990-01-31 20:15:00 2962

Implementation notes

The functions are written in pure R. They are not especially efficent implimentation, particularly the calls to seek when writing and reading data are always made from the start of the file. Given both xts objects and the database are ordered this could be greatly improved.

Using clean=FALSE when wrting tot he database can be much faster, however this removes all checks that stop newer data being overwritten by older values that may be passed in. With the check in place this results in an error.