Let the observed data y1 : T = (y1, …, yT) which are indexed by time come from a parametric model with time parameter vectors θt whose density is given by P(yt|θt). The parameters in θt may be time varying or invariant. Anomalies are modelled as parametric epidemic changepoints, represented by changes in the parameters θt which are common across all timesteps in the anomaly.
The ith anomalous period consists to ni consecuative time steps which are denoted by the set T[i]. The K anomalous periods are disjoint so $\bigcap\limits_{i=1}^{K} T^{\left[i\right]} = \emptyset$ and ordered such that $\max\limits_{t \in T^{\left[i\right]}} t < \min\limits_{t \in T^{\left[j\right]}} t$ for all i < j. the variations in the parameters caused by the anomalous periods is given by $$ \theta_{t} = \left\{ \begin{array}{ll} \theta_{t}^{\left[1\right]} & t \in T^{\left[1\right]} \\ & \vdots \\ \theta_{t}^{\left[K\right]} & t \in T^{\left[K\right]} \\ \theta_{t}^{\left[0\right]} & \mathrm{otherwise} \end{array} \right. $$ The density and values of θt[0] determine the non anomalous behaviour of the process generating the observed data. If these are considered known a priori then the anomalous periods can be determined by the selection of K, T[1], …, T[K] to minimise the penalised cost
$$ \sum\limits_{t\notin\cup T^{\left[i\right]}} \mathcal{C}\left(y_{t},\theta_{t}^{\left[0\right]}\right) + \sum\limits_{i=1,\ldots,K}\left\{ \min_{\theta_{t}^{\left[i\right]}}\left( \sum\limits_{t \in T^{\left[i\right]}} \mathcal{C}\left(y_{t},\theta_{t}^{\left[i\right]}\right) \right) + \beta \right\} $$
subject to ni > l. The minimum anomaly length l is related to the anoamly cost function 𝒞(yt, θt) and ensures that the minimum with respect to θt[i] can be found. Concrete examples of this framework cost functions can be found in the cost function vignettes.
One possible definition of 𝒞(yt, θt) is as the negative log-likelihood of data given by the parametric model. In such cases a common choices for the penalty β are based on informationc criteria commonly used for model selection . As noted in <> in practical settings may of these criteria perform poorly. Instead, in the follwoing section the CROPS algorithm, whch offers a graphical selection method for the selecton of the penalty term in changepoint analysis is adapted for use in this anomaly framework.
to the minimum cost of a partition with K anomalies given by
$$ Q_{K}\left(\mathbf{y}_{1:T}\right) = \min\limits_{T^{\left[1\right]},\ldots,T^{\left[K\right]}} \left(\sum\limits_{t\notin\cup T^{\left[i\right]}} \mathcal{C}\left(y_{t},\theta_{t}^{\left[0\right]}\right) + \sum\limits_{i=1,\ldots,K}\left\{ \min_{\theta_{t}^{\left[i\right]}}\left( \sum\limits_{t \in T^{\left[i\right]}} \mathcal{C}\left(y_{t},\theta_{t}^{\left[i\right]}\right) \right) \right\} \right) $$
through $$ Q\left(\mathbf{y}_{1:T},\beta\right) = \min\limits_{K} \left( Q_{K}\left(\mathbf{y}_{1:T}\right) + K\beta \right) $$
This is exactly the form of the CROPS paper so theorom 3.1 and algorithm still apply
Select penalty based on number of standard deviations away from the mean then run CROPS for collective anomaly. TODO - document this is correct