--- title: "Univariate Gaussian Cost Calculations" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Univariate Gaussian Cost Calculations} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The purpose of this vignette is to present the calculations of the costs for the univariate Gaussian distribution. Each time step $t$ belongs to group $k$ whose time stamps are the set $T_{k}$. A group can have additive mean anomaly $\mu_{k}$ and multiplicative variance anomaly $\sigma_{k}$ which are common for $t \in T_{k}$. Assuming the {\it a priori} known mean $m_{t}$ and variance $s_{t}$ of the data generating distribution gives for $t \in T_{k}$ \[ P\left(y_t \left| m_{t},s_{t}, \mu_k,\sigma_k\right.\right) = \frac{1}{\sqrt{2\pi\sigma_{t}s_{k}}}\exp\left(-\frac{1}{2\sigma_{k}s_{t}}\left(y_{t} - m_t - \mu_{k}\right)^2\right) \] The cost is computed as twice the negative log likelhiood plus a penalty term $\beta$ giving \[ C\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \mu_k,\sigma_k\right.\right) = n_{k} \log\left(2\pi \sigma_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\frac{1}{\sigma_{k}}\sum\limits_{t \in T_{k}} \frac{\left(y_{t} - m_t - \mu_{k}\right)^2}{s_{t}} + \beta \] ### No Anomaly (Baseline) Here $\mu_{k}=0$ and $\sigma_{k}=1$ and there is no penalty so the cost is \[ C_{B}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}} \right.\right) = n_{k} \log\left(2\pi \right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\sum\limits_{t \in T_{k}} \frac{\left(y_{t} - m_t\right)^2}{s_{t}} \] ### Collective Anomalies Collective anomalies last more then a single timestep and chnage the mean and/or variance. #### Anomaly in Mean and Variance Estimates $\hat{\mu}$ of $\mu$ and $\hat{\sigma}$ of $\sigma$ can be selected to minimise the cost by taking \[ \hat{\mu}_{k} = \left( \sum\limits_{t \in T_k} \frac{y_t-m_t}{s_t} \right)\left( \sum\limits_{t \in T_k} \frac{1}{s_t}\right)^{-1} \] and \[ \hat{\sigma}_{k} = \frac{1}{n_{k}} \sum\limits_{t \in T_k} \frac{ \left(y_t-m_t - \hat{\mu}_{k}\right)^2}{s_t} \] Subsituting these into the cost gives \[ C_{MV}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\mu}_k,\hat{\sigma}_k\right.\right) = n_{k} \log\left(2\pi \hat{\sigma}_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +n_{k} + \beta \] #### Anomaly in Mean There is no change in variance so $\sigma_{k}=1$. The Estimate of $\hat{\mu}_{k}$ is unchanged from that for an anomaly in mean and variance so the cost is \[ C_{M}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\mu}_k\right.\right) = n_{k} \log\left(2\pi\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\sum\limits_{t \in T_k} \frac{ \left(y_t - m_t - \hat{\mu}_{k}\right)^2}{s_t} + \beta \] can be written as \[ C_{M}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\mu}_k\right.\right) = n_{k} \log\left(2\pi\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\sum\limits_{t \in T_k} \frac{ \left(y_t - m_t\right)^2}{s_t} -\hat{\mu}^{2} \sum\limits_{t \in T_k} \frac{ 1}{s_t} + \beta \] #### Anomaly in Variance These is no mean anomaly so $\mu_{k}=0$. Estimate of $\hat{\sigma}_{k}$ therfore changes to \[ \hat{\sigma}_{k} = \frac{1}{n_{k}} \sum\limits_{t \in T_k} \frac{ \left(y_t-m_t\right)^2}{s_t} \] and cost is \[ C_{V}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\sigma}_k\right.\right) = n_{k} \log\left(2\pi \hat{\sigma}_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +n_{k} + \beta \] ### Point anomaly A point anomaly at time $t$ is treated as a single time step with an change in mean or variance. However the cost of the point anomaly should be higher then the background cost when $y_{t}$ is, in some sense, close to the background. The cost of a point anomaly in mean is expressed as \[ C_{P_{M}}\left(y_{t}\left| m_{t},s_{t},\hat{\mu}_{k}\right.\right) = \log\left(2\pi s_{t}\right) + \beta \] while it's value relative to the baseline cost is can be expressed using the standardised variable $z_{t} = \frac{y_t-m_t}{\sqrt{s_{t}}}$ as \[ C_{P_{M}}\left(y_{t}\left| m_{t},s_{t},\hat{\mu}_{k}\right.\right) - C_{B}\left(y_{t}\left| m_{t},s_{t}\right.\right) = \beta - z_{t}^{2} \] The penaly value in this case can then be clearly linked to the number of standard deviations away from the mean at which to declare a point anomaly. In the case of a point anomaly in variance a naive computation of the cost gives \[ C_{P_{V}}\left(y_{t}\left| m_{t},s_{t},\hat{\sigma}_{k}\right.\right)= \log\left(2\pi s_{t} \right) + \log\left(z_{t}^{2}\right) + 1 + \beta \] and \[ C_{P_{V}}\left(y_{t}\left| m_{t},s_{t},\hat{\sigma}_{k}\right.\right) - C_{B}\left(y_{t}\left| m_{t},s_{t}\right.\right) = \log\left(z_{t}^{2}\right) + 1 + \beta - z_{t}^2 \] Since $\lim\left(z_{t}^{2}\right) \rightarrow \infty$ as $z_{t}^{2} \rightarrow 0$ the niave definition of a point anomaly in variance will always produce point anomalies when $z_{t}$ is close to 0. Fisch et al. introduce a term $\gamma$ to control this. The modified cost of a point anomaly in variance is expressed as \[ C_{P_{V}}\left(y_{t}\left| m_{t},s_{t},\hat{\sigma}_{k},\gamma\right.\right)= \log\left(2\pi s_{t} \right) + \log\left(\gamma + z_{t}^{2}\right) + 1 + \beta \] Relating this to the background cost we see that point anomalies may be accepted in the capa search when \[ f\left(z_{t},\gamma,\beta\right) = C_{P_{V}}\left(y_{t}\left| m_{t},s_{t},\hat{\sigma}_{k},\gamma\right.\right) - C_{B}\left(y_{t}\left| m_{t},s_{t}\right.\right) = \log\left(\gamma + z_{t}^{2}\right) + 1 + \beta - z_{t}^2 < 0 \] To ensure that anomalies are not declared when $z_{t}$ is close to 0 this implies that $\gamma$ should be selected such that 1. $f\left(0,\gamma,\beta\right) \geq 0$ 2. $\gamma < 1$ so the gradient \[ \frac{\partial}{\partial z_{t}^2} f\left(z_{t},\gamma,\beta\right) = \frac{1}{\gamma + z_{t}^{2}} - 1 > 0 \] for $z_{t}$ close to zero. The following plot shows the impact for small $z$ of three different choices of $\gamma$: - The non correction of $\gamma_{0} = 0$ which allows point anomalies as $z_{t}$ approaches 0 - The correction $\gamma_{1} = \exp\left(-\beta\right)$ proposed by Fisch et al. - The minimal correction $\gamma_{2} = \exp\left(-\left(1+\beta\right)\right)$ for which $f\left(0,\gamma_{2},\beta\right) = 0$. ```{r echo=FALSE, f, fig.width=7, fig.height=7} fz <- function(zsq,gamma,beta){ log(gamma+zsq) + 1 + beta - zsq } zsq <- seq(0,1e-3,length=10000) Y <- matrix(NA,length(zsq),3) colnames(Y) <- c("gamma_0","gamma_1","gamma_2") b <- 10 Y[,1] <- fz(zsq,0,b) Y[,2] <- fz(zsq,exp(-b),10) Y[,3] <- fz(zsq,exp(-(1+b)),10) matplot(sqrt(zsq),Y,type="l",xlab="|z|",ylab="f(z,gamma,10)"); legend("bottomright",colnames(Y),col=1:3,lty=1:3) ``` It is clear that the difference become small as $z$ increases. This is supported by the plot below which shows the value of $z_{t}$ at which an point anomaly might occur as $\beta$ varies. Area above the line are potential anomaly values. ```{r echo=FALSE, when_anaom, fig.width=7, fig.height=7} betaRng <- seq(log(2),2*log(10),length=1000) ###2*log(1:1000) Y <- matrix(NA,length(betaRng),3) colnames(Y) <- c("gamma_0","gamma_1","gamma_2") for(ii in 1:length(betaRng)){ rng <- c(1.25,25) b <- betaRng[ii] Y[ii,1] <- uniroot(fz,rng,gamma=0,beta=b)$root Y[ii,2] <- uniroot(fz,rng,gamma=exp(-b),beta=b)$root Y[ii,3] <- uniroot(fz,rng,gamma=exp(-(1+b)),beta=b)$root } matplot(betaRng,sqrt(Y),type="l",xlab="beta",ylab="|z|"); legend("bottomright",colnames(Y),col=1:3,lty=1:3) ```