The purpose of this vignette is to present the calculations of the costs for the categorical distribution over \(N\) classes.
Each time step \(t\) belongs to group \(k\) whose time stamps are the set \(T_{k}\). A group has either an a priori known probability of being in each class \(p = \left(p_{1},\ldots,p_{N}\right)\) or an unknown probability of being in each class \(\lambda_{k} = \left(\lambda_{k,1},\ldots,\lambda_{k,N}\right)\)
The data generating distribution gives for \(t \in T_{k}\)
\[ P\left(y_t \left| p \right.\right) = \prod\limits_{i=1}^{N} p_{i}^{y_{t,i}} \]
where \(y_{t,i}=1\) if the \(t\)th sample is in the \(i\)th class and zero otherwise. For convience let there be \(n_{k}\) samples in \(T_{k}\) of which \(n_{k,i}\) are of class \(i\).
The cost is computed as twice the negative log likelhiood and there is no penalty term giving
\[ C_{B}\left(y_{t \in T_{k}} \left| p \right.\right) = -2 \sum\limits_{t \in T_{k}} \sum\limits_{i=1}^{N} y_{t,i} \log\left( p_{i} \right) =-2 \sum\limits_{i=1}^{N} n_{k,i} \log\left( p_{i} \right) \]
In the case of the anomaly the cost is computed by
\[ C_{A}\left(y_{t \in T_{k}} \left| \lambda_{k} \right.\right) = \beta - 2 \sum\limits_{t \in T_{k}} \sum\limits_{i=1}^{N} y_{t,i} \log\left( \hat{\lambda}_{k,i} \right) = \beta - 2 \sum\limits_{i=1}^{N} n_{k,i} \log\left( \hat{\lambda}_{k,i} \right) \]
where
\[ \hat{\lambda}_{k,j} = \frac{ \sum\limits_{t \in T_{k}} y_{t,j} } { \sum\limits_{i=1}^{N} \sum\limits_{t \in T_{k}} y_{t,i} } = \frac{n_{k,j}}{n_{k}} \]
An anomalous region is created when
\[ C_{A}\left(y_{t \in T_{k}} \left| \lambda_{k} \right.\right) - C_{B}\left(y_{t \in T_{k}} \left| p \right.\right) = \beta - 2 \sum\limits_{i=1}^{N} n_{k,i} \left( \log\left( n_{k,i} \right) - \log\left( n_{k} p_{i} \right) \right) <0 \]
In the case of a poitn anomaly \(n_{k}=n_{k,j}=1\) giving
\[ C_{A}\left(y_{t \in T_{k}} \left| \lambda_{k} \right.\right) - C_{B}\left(y_{t \in T_{k}} \left| p \right.\right) = \beta + 2 \log\left( p_{j} \right) \]