The purpose of this vignette is to present the calculations for a peicewise linear regression where for each time step there are multiple independent observations.

In the follow variables identified by Greek letters are considered unknown.

Linear regression

At time step t the vector of iid observations y_t = {y_t, 1, …, y_t, p} is explained by the design matrix X_t and modelled as a multivariate Gaussian distribution. Consider known, ‘’background’’, parameters m_t and precision matrix S_t = U^′U deviation from which are modelled by θ and Λ through the likelihood

$$ L\left(\mathbf{y}_{t} \left| \theta,\lambda\right.\right) = \left(2\pi\right)^{-p/2} \det\left(\mathbf{S}_{t}\right)^{1/2} \det\left(\Lambda\right)^{1/2} \exp\left(-\frac{1}{2}\left( \mathbf{y}_{t} - \mathbf{X}_{t} \mathbf{m}_{t} - \mathbf{X}_{t} \theta\right)^{\prime} \mathbf{U}^{\prime} \Lambda \mathbf{U} \left( \mathbf{y}_{t} - \mathbf{X}_{t} \mathbf{m}_{t} - \mathbf{X}_{t} \theta\right) \right) $$

Pre whitening the known values such that $\hat{\mathbf{y}}_{t} = \mathbf{U}_{t} \left(\mathbf{y}_{t} - \mathbf{X}_{t} \mathbf{m}_{t}\right)$ and $\hat{\mathbf{X}}_{t} = \mathbf{U}_{t} \mathbf{X}_{t}$ gives

$$ L\left(\mathbf{y}_{t} \left| \theta,\Lambda\right.\right) = \left(2\pi\right)^{-p/2} \det\left(\mathbf{S}_{t}\right)^{1/2} \det\left(\Lambda\right)^{1/2} \exp\left(-\frac{1}{2}\left( \hat{\mathbf{y}}_{t} - \hat{\mathbf{X}}_{t} \theta\right)^{\prime} \Lambda \left( \hat{\mathbf{y}}_{t} - \hat{\mathbf{X}}_{t} \theta\right) \right) $$

Grouping the known values into K_t = plog (2π) − log (det S_t) the log likelihood is $$ l\left(\mathbf{y}_{t} \left| \theta,\Lambda \right.\right) = -\frac{1}{2}K_{t} + \frac{1}{2}\log\left( \det\left(\Lambda\right)\right) -\frac{1}{2}\left( \hat{\mathbf{y}}_{t} - \hat{\mathbf{X}}_{t} \theta\right)^{\prime} \Lambda \left( \hat{\mathbf{y}}_{t} - \hat{\mathbf{X}}_{t} \theta\right) $$

Suppose an anomaly with common parameters occurs of n_k consecuative time steps in the set T_k. The log-likelihood of y_{t ∈ T_k} is $$ l\left(\mathbf{y}_{t \in T_{k}} \left| \theta_{k},\Lambda_{k} \right.\right) = -\frac{1}{2}\sum_{t \in T_{k}}K_{t} + \frac{n_{k}}{2}\log\left( \det\left(\Lambda\right)\right) -\frac{1}{2}\sum_{t \in T_{k}}\left( \hat{\mathbf{y}}_{t} - \hat{\mathbf{X}}_{t} \theta_{k}\right)^{\prime} \Lambda_{k} \left( \hat{\mathbf{y}}_{t} - \hat{\mathbf{X}}_{t} \theta_{k}\right) $$

with the cost being twice the negative log likelihood plus a penalty β giving

$$ C\left(\mathbf{y}_{t \in T_{k}} \left| \theta_{k}, \Lambda_{k} \right.\right) = \sum_{t \in T_{k}}K_{t} - n_{k}\log\left( \det\left(\Lambda\right)\right) +\sum_{t \in T_{k}}\left( \hat{\mathbf{y}}_{t} - \hat{\mathbf{X}}_{t} \theta_{k}\right)^{\prime} \Lambda_{k} \left( \hat{\mathbf{y}}_{t} - \hat{\mathbf{X}}_{t} \theta_{k}\right) + \beta $$

Sufficent statistics

Computation is greatly aided by being able to keep adequate sufficent statistics. Expanding the summation in the cost gives $$ \sum_{t \in T_{k}}\left( \hat{\mathbf{y}}_{t} - \hat{\mathbf{X}}_{t} \theta_{k}\right)^{\prime} \Lambda_{k} \left( \hat{\mathbf{y}}_{t} - \hat{\mathbf{X}}_{t} \theta_{k}\right) = \sum_{t \in T_{k}} \left( \hat{\mathbf{y}}^{\prime}_{t} \Lambda \hat{\mathbf{y}}_{t} + \theta^{\prime}_{k}\hat{\mathbf{X}}^{\prime}_{t} \Lambda \hat{\mathbf{X}}_{t} \theta_{k} - 2 \theta^{\prime}_{k} \hat{\mathbf{X}}^{\prime}_{t}\Lambda \hat{\mathbf{y}}_{t} \right) $$ $$ \sum_{t \in T_{k}} \left( \mathrm{tr}\left( \hat{\mathbf{y}}_{t}\hat{\mathbf{y}}^{\prime}_{t} \Lambda \right) + \theta^{\prime}_{k}\hat{\mathbf{X}}^{\prime}_{t} \Lambda \hat{\mathbf{X}}_{t} \theta_{k} - 2 \theta^{\prime}_{k} \hat{\mathbf{X}}^{\prime}_{t}\Lambda \hat{\mathbf{y}}_{t} \right) $$

Baseline: No Anomaly

Here θ_k = 0, Λ_k is an identify matrix and there is no penalty so β = 0. The resulting csot is $$ C_{B}\left(\mathbf{y}_{t \in T_{k}} \left| \theta_{k}, \Lambda_{k} \right.\right) = \sum_{t \in T_{k}} K_{t} + \sum_{t \in T_{k}} \hat{\mathbf{y}}_{t}^{\prime} \hat{\mathbf{y}}_{t} $$

Collective Anomalies

Anomaly in Regression parameters

There is no change in variance so Λ_k is an identify matrix. The estimate θ̂_k of θ_k can be selected to minimise the cost by taking

$$ \hat{\theta}_{k} = \left( \sum\limits_{t \in T_k} \hat{\mathbf{X}}_{t}^{\prime} \hat{\mathbf{X}}_{t} \right)^{-1} \left( \sum\limits_{t \in T_k} \hat{\mathbf{X}}_{t}^{\prime} \hat{\mathbf{y}}_{t} \right) $$

$$ C_{C}\left(\mathbf{y}_{t \in T_{k}} \left| \mu_t,m_k,\sigma_k,s_k\right.\right) = \sum_{t \in T_{k}} K_{t} + \left( \sum_{t \in T_{k}} \hat{\mathbf{y}}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t} \right) - \left( \sum\limits_{t \in T_k} \mathbf{X}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t} \right)^{\prime} \left( \sum\limits_{t \in T_k} \mathbf{X}_{t}^{\prime} \mathbf{S}_{t}^{-1}\mathbf{X}_{t} \right)^{-1} \left( \sum\limits_{t \in T_k} \mathbf{X}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t} \right) +\beta $$

Anomaly in Variance

These is no mean anomaly in the regression parameters so θ_k = 0. The estimate of σ_k therfore changes to

$$ \hat{\sigma}_{k} = \frac{1}{n_{k}} \sum\limits_{t \in T_{k}} \hat{\mathbf{y}}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t} $$

while the cost is C_C(y_{t ∈ T_k}|μ_t, m_k, σ_k, s_k) = ∑_{t ∈ T_k}K_t + n_klog (σ̂_k) + n_k + β

Anomaly in regression parameters and variance

Since $$ \sum_{t \in T_{k}} \left( \hat{\mathbf{y}}_{t} - \mathbf{X}_{t} \theta_{k}\right)^{\prime} \mathbf{S}_{t}^{-1} \left( \hat{\mathbf{y}}_{t} - \mathbf{X}_{t} \theta_{k}\right) = \sum_{t \in T_{k}} \left( \hat{\mathbf{y}}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t} - 2 \theta_{k}^{\prime}\mathbf{X}_{t}^{\prime} \mathbf{S}_{t}^{-1} \hat{\mathbf{y}}_{t} + \theta_{k}^{\prime}\mathbf{X}_{t}^{\prime} \mathbf{S}_{t}^{-1}\mathbf{X}_{t} \theta_{k} \right) $$

The estimate θ̂_k of θ_k can be selected to minimise the cost by taking $$ \hat{\theta}_{k} = \left( \sum\limits_{t \in T_k} \mathbf{X}_{t}^{\prime} \mathbf{S}_{t}^{-1}\mathbf{X}_{t} \right)^{-1} \left( \sum\limits_{t \in T_k} \mathbf{X}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t} \right) $$

Subsitution of this result into the cost gives $$ \hat{\sigma}_{k} = \frac{1}{n_{k}} \sum_{t \in T_{k}} \left( \hat{\mathbf{y}}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t} - 2 \hat{\theta}_{k}^{\prime}\mathbf{X}_{t}^{\prime} \mathbf{S}_{t}^{-1} \hat{\mathbf{y}}_{t} + \hat{\theta}_{k}^{\prime}\mathbf{X}_{t}^{\prime} \mathbf{S}_{t}^{-1}\mathbf{X}_{t} \hat{\theta}_{k} \right) $$ which simplifies to $$ \hat{\sigma}_{k} = \frac{1}{n_{k}} \left[ \left( \sum_{t \in T_{k}} \hat{\mathbf{y}}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t} \right) - \left( \sum\limits_{t \in T_k} \mathbf{X}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t} \right)^{\prime} \left( \sum\limits_{t \in T_k} \mathbf{X}_{t}^{\prime} \mathbf{S}_{t}^{-1}\mathbf{X}_{t} \right)^{-1} \left( \sum\limits_{t \in T_k} \mathbf{X}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t} \right) \right] $$

The cost is given by C_C(y_{t ∈ T_k}|μ_t, m_k, σ_k, s_k) = ∑_{t ∈ T_k}K_t + n_klog (σ̂_k) + n_k + β

Anomaly in Regression parameters

There is no change in variance so σ_k = 1. The estimate of θ̂_k is unchanged which gives a cost of

Anomaly in Variance

These is no mean anomaly in the regression parameters so θ_k = 0. The estimate of σ_k therfore changes to

$$ \hat{\sigma}_{k} = \frac{1}{n_{k}} \sum\limits_{t \in T_{k}} \hat{\mathbf{y}}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t} $$

while the cost is C_C(y_{t ∈ T_k}|μ_t, m_k, σ_k, s_k) = ∑_{t ∈ T_k}K_t + n_klog (σ̂_k) + n_k + β

Point anomaly

A point anomaly occurs at a single time instance and is represented as a variance anomaly. Naively the cost could be computed using the formulea for a variance anomaly as C_p(y_t|σ_t) = K_t + n_tlog (σ̂_t) + n_t + β with $$ \hat{\sigma}_{t} = \frac{1}{n_{t}} \hat{\mathbf{y}}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t} $$

Relating this to the background cost we see that point anomalies may be accepted in the capa search when f(σ̂_t, γ, β) = C_p(y_t|μ_t, σ_t) − C_B(y_t|μ_t, σ_t) = n_tlog (σ̂_t) + n_t + β − n_tσ̂_t < 0

The following plot shows log (σ̂_t) + 1 − σ̂_t which indicates that point anomalies may be declared for both outlying and inlying data.

In the case of n_t = 1 Fisch et al. control this by modifying the cost of a point anomaly so it is expressed as C_p(y_t|σ_t, X_t) = log (exp (−β) + σ̂_t) + K_t + 1 + β

This has the effect of allowing only outlier anomalies, something that can be much more easily acheived by taking

$$ \hat{\sigma}_{t} = \max\left(1,\frac{1}{n_{t}} \hat{\mathbf{y}}_{t}^{\prime} \mathbf{S}_{t}^{-1}\hat{\mathbf{y}}_{t}\right) $$

giving the cost as

$$ C\left(\mathbf{y}_{t \in T_{k}} \left| \mu_t,m_k,\sigma_k,s_k\right.\right) = \sum_{t \in T_{k}} K_{t} +n_{k} \log\left(\hat{\sigma}_{k}\right) +\frac{1}{\hat{\sigma}_{k}}\sum_{t \in T_{k}} \left( \hat{\mathbf{y}}_{t} \mathbf{S}_{t}^{-1} \hat{\mathbf{y}}_{t} \right) +\beta $$

- Linear regression

Regression Cost Calculations

Linear regression

Sufficent statistics

Baseline: No Anomaly

Collective Anomalies

Anomaly in Regression parameters

Anomaly in Variance

Anomaly in regression parameters and variance

Anomaly in Regression parameters

Anomaly in Variance

Point anomaly