The purpose of this vignette is to present the calculations of the costs for the univariate Gaussian distribution.
Each time step t belongs to group k whose time stamps are the set Tk. A group can have additive mean anomaly μk and multiplicative variance anomaly σk which are common for t ∈ Tk. Assuming the {} known mean mt and variance st of the data generating distribution gives for t ∈ Tk
$$ P\left(y_t \left| m_{t},s_{t}, \mu_k,\sigma_k\right.\right) = \frac{1}{\sqrt{2\pi\sigma_{t}s_{k}}}\exp\left(-\frac{1}{2\sigma_{k}s_{t}}\left(y_{t} - m_t - \mu_{k}\right)^2\right) $$
The cost is computed as twice the negative log likelhiood plus a penalty term β giving
$$ C\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \mu_k,\sigma_k\right.\right) = n_{k} \log\left(2\pi \sigma_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\frac{1}{\sigma_{k}}\sum\limits_{t \in T_{k}} \frac{\left(y_{t} - m_t - \mu_{k}\right)^2}{s_{t}} + \beta $$
Here μk = 0 and σk = 1 and there is no penalty so the cost is
$$ C_{B}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}} \right.\right) = n_{k} \log\left(2\pi \right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\sum\limits_{t \in T_{k}} \frac{\left(y_{t} - m_t\right)^2}{s_{t}} $$
Collective anomalies last more then a single timestep and chnage the mean and/or variance.
Estimates μ̂ of μ and σ̂ of σ can be selected to minimise the cost by taking
$$ \hat{\mu}_{k} = \left( \sum\limits_{t \in T_k} \frac{y_t-m_t}{s_t} \right)\left( \sum\limits_{t \in T_k} \frac{1}{s_t}\right)^{-1} $$ and $$ \hat{\sigma}_{k} = \frac{1}{n_{k}} \sum\limits_{t \in T_k} \frac{ \left(y_t-m_t - \hat{\mu}_{k}\right)^2}{s_t} $$
Subsituting these into the cost gives $$ C_{MV}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\mu}_k,\hat{\sigma}_k\right.\right) = n_{k} \log\left(2\pi \hat{\sigma}_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +n_{k} + \beta $$
There is no change in variance so σk = 1. The Estimate of μ̂k is unchanged from that for an anomaly in mean and variance so the cost is
$$ C_{M}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\mu}_k\right.\right) = n_{k} \log\left(2\pi\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\sum\limits_{t \in T_k} \frac{ \left(y_t - m_t - \hat{\mu}_{k}\right)^2}{s_t} + \beta $$
can be written as
$$ C_{M}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\mu}_k\right.\right) = n_{k} \log\left(2\pi\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\sum\limits_{t \in T_k} \frac{ \left(y_t - m_t\right)^2}{s_t} -\hat{\mu}^{2} \sum\limits_{t \in T_k} \frac{ 1}{s_t} + \beta $$
These is no mean anomaly so μk = 0. Estimate of σ̂k therfore changes to
$$ \hat{\sigma}_{k} = \frac{1}{n_{k}} \sum\limits_{t \in T_k} \frac{ \left(y_t-m_t\right)^2}{s_t} $$
and cost is
$$ C_{V}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\sigma}_k\right.\right) = n_{k} \log\left(2\pi \hat{\sigma}_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +n_{k} + \beta $$
A point anomaly at time t is treated as a single time step with an change in mean or variance. However the cost of the point anomaly should be higher then the background cost when yt is, in some sense, close to the background.
The cost of a point anomaly in mean is expressed as
CPM(yt|mt, st, μ̂k) = log (2πst) + β
while it’s value relative to the baseline cost is can be expressed using the standardised variable $z_{t} = \frac{y_t-m_t}{\sqrt{s_{t}}}$ as
CPM(yt|mt, st, μ̂k) − CB(yt|mt, st) = β − zt2
The penaly value in this case can then be clearly linked to the number of standard deviations away from the mean at which to declare a point anomaly.
In the case of a point anomaly in variance a naive computation of the cost gives
CPV(yt|mt, st, σ̂k) = log (2πst) + log (zt2) + 1 + β
and
CPV(yt|mt, st, σ̂k) − CB(yt|mt, st) = log (zt2) + 1 + β − zt2
Since lim (zt2) → ∞ as zt2 → 0 the niave definition of a point anomaly in variance will always produce point anomalies when zt is close to 0. Fisch et al. introduce a term γ to control this. The modified cost of a point anomaly in variance is expressed as
CPV(yt|mt, st, σ̂k, γ) = log (2πst) + log (γ + zt2) + 1 + β
Relating this to the background cost we see that point anomalies may be accepted in the capa search when f(zt, γ, β) = CPV(yt|mt, st, σ̂k, γ) − CB(yt|mt, st) = log (γ + zt2) + 1 + β − zt2 < 0
To ensure that anomalies are not declared when zt is close to 0 this implies that γ should be selected such that
f(0, γ, β) ≥ 0
γ < 1 so the gradient $$ \frac{\partial}{\partial z_{t}^2} f\left(z_{t},\gamma,\beta\right) = \frac{1}{\gamma + z_{t}^{2}} - 1 > 0 $$ for zt close to zero.
The following plot shows the impact for small z of three different choices of γ:
It is clear that the difference become small as z increases. This is supported by the plot below which shows the value of zt at which an point anomaly might occur as β varies. Area above the line are potential anomaly values.