Univariate Gaussian Cost Calculations

The purpose of this vignette is to present the calculations of the costs for the univariate Gaussian distribution.

Each time step t belongs to group k whose time stamps are the set T_k. A group can have additive mean anomaly μ_k and multiplicative variance anomaly σ_k which are common for t ∈ T_k. Assuming the {} known mean m_t and variance s_t of the data generating distribution gives for t ∈ T_k

$$ P\left(y_t \left| m_{t},s_{t}, \mu_k,\sigma_k\right.\right) = \frac{1}{\sqrt{2\pi\sigma_{t}s_{k}}}\exp\left(-\frac{1}{2\sigma_{k}s_{t}}\left(y_{t} - m_t - \mu_{k}\right)^2\right) $$

The cost is computed as twice the negative log likelhiood plus a penalty term β giving

$$ C\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \mu_k,\sigma_k\right.\right) = n_{k} \log\left(2\pi \sigma_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\frac{1}{\sigma_{k}}\sum\limits_{t \in T_{k}} \frac{\left(y_{t} - m_t - \mu_{k}\right)^2}{s_{t}} + \beta $$

No Anomaly (Baseline)

Here μ_k = 0 and σ_k = 1 and there is no penalty so the cost is

$$ C_{B}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}} \right.\right) = n_{k} \log\left(2\pi \right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\sum\limits_{t \in T_{k}} \frac{\left(y_{t} - m_t\right)^2}{s_{t}} $$

Collective Anomalies

Collective anomalies last more then a single timestep and chnage the mean and/or variance.

Anomaly in Mean and Variance

Estimates μ̂ of μ and σ̂ of σ can be selected to minimise the cost by taking

$$ \hat{\mu}_{k} = \left( \sum\limits_{t \in T_k} \frac{y_t-m_t}{s_t} \right)\left( \sum\limits_{t \in T_k} \frac{1}{s_t}\right)^{-1} $$ and $$ \hat{\sigma}_{k} = \frac{1}{n_{k}} \sum\limits_{t \in T_k} \frac{ \left(y_t-m_t - \hat{\mu}_{k}\right)^2}{s_t} $$

Subsituting these into the cost gives $$ C_{MV}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\mu}_k,\hat{\sigma}_k\right.\right) = n_{k} \log\left(2\pi \hat{\sigma}_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +n_{k} + \beta $$

Anomaly in Mean

There is no change in variance so σ_k = 1. The Estimate of μ̂_k is unchanged from that for an anomaly in mean and variance so the cost is

$$ C_{M}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\mu}_k\right.\right) = n_{k} \log\left(2\pi\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\sum\limits_{t \in T_k} \frac{ \left(y_t - m_t - \hat{\mu}_{k}\right)^2}{s_t} + \beta $$

can be written as

Anomaly in Variance

These is no mean anomaly so μ_k = 0. Estimate of σ̂_k therfore changes to

$$ \hat{\sigma}_{k} = \frac{1}{n_{k}} \sum\limits_{t \in T_k} \frac{ \left(y_t-m_t\right)^2}{s_t} $$

and cost is

$$ C_{V}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\sigma}_k\right.\right) = n_{k} \log\left(2\pi \hat{\sigma}_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +n_{k} + \beta $$

Point anomaly

A point anomaly at time t is treated as a single time step with an change in mean or variance. However the cost of the point anomaly should be higher then the background cost when y_t is, in some sense, close to the background.

The cost of a point anomaly in mean is expressed as

C_{P_M}(y_t|m_t, s_t, μ̂_k) = log (2πs_t) + β

while it’s value relative to the baseline cost is can be expressed using the standardised variable $z_{t} = \frac{y_t-m_t}{\sqrt{s_{t}}}$ as

C_{P_M}(y_t|m_t, s_t, μ̂_k) − C_B(y_t|m_t, s_t) = β − z_t²

The penaly value in this case can then be clearly linked to the number of standard deviations away from the mean at which to declare a point anomaly.

In the case of a point anomaly in variance a naive computation of the cost gives

C_{P_V}(y_t|m_t, s_t, σ̂_k) = log (2πs_t) + log (z_t²) + 1 + β

and

C_{P_V}(y_t|m_t, s_t, σ̂_k) − C_B(y_t|m_t, s_t) = log (z_t²) + 1 + β − z_t²

Since lim (z_t²) → ∞ as z_t² → 0 the niave definition of a point anomaly in variance will always produce point anomalies when z_t is close to 0. Fisch et al. introduce a term γ to control this. The modified cost of a point anomaly in variance is expressed as

C_{P_V}(y_t|m_t, s_t, σ̂_k, γ) = log (2πs_t) + log (γ + z_t²) + 1 + β

Relating this to the background cost we see that point anomalies may be accepted in the capa search when f(z_t, γ, β) = C_{P_V}(y_t|m_t, s_t, σ̂_k, γ) − C_B(y_t|m_t, s_t) = log (γ + z_t²) + 1 + β − z_t² < 0

To ensure that anomalies are not declared when z_t is close to 0 this implies that γ should be selected such that

f(0, γ, β) ≥ 0
γ < 1 so the gradient $$ \frac{\partial}{\partial z_{t}^2} f\left(z_{t},\gamma,\beta\right) = \frac{1}{\gamma + z_{t}^{2}} - 1 > 0 $$ for z_t close to zero.

The following plot shows the impact for small z of three different choices of γ:

The non correction of γ₀ = 0 which allows point anomalies as z_t approaches 0
The correction γ₁ = exp (−β) proposed by Fisch et al.
The minimal correction γ₂ = exp (−(1 + β)) for which f(0, γ₂, β) = 0.

It is clear that the difference become small as z increases. This is supported by the plot below which shows the value of z_t at which an point anomaly might occur as β varies. Area above the line are potential anomaly values.