The purpose of this vignette is to present the calculations of the costs for the categorical distribution over N classes.
Each time step t belongs to group k whose time stamps are the set Tk. A group has either an a priori known probability of being in each class p = (p1, …, pN) or an unknown probability of being in each class λk = (λk, 1, …, λk, N)
The data generating distribution gives for t ∈ Tk
$$ P\left(y_t \left| p \right.\right) = \prod\limits_{i=1}^{N} p_{i}^{y_{t,i}} $$
where yt, i = 1 if the tth sample is in the ith class and zero otherwise. For convience let there be nk samples in Tk of which nk, i are of class i.
The cost is computed as twice the negative log likelhiood and there is no penalty term giving
$$ C_{B}\left(y_{t \in T_{k}} \left| p \right.\right) = -2 \sum\limits_{t \in T_{k}} \sum\limits_{i=1}^{N} y_{t,i} \log\left( p_{i} \right) =-2 \sum\limits_{i=1}^{N} n_{k,i} \log\left( p_{i} \right) $$
In the case of the anomaly the cost is computed by
$$ C_{A}\left(y_{t \in T_{k}} \left| \lambda_{k} \right.\right) = \beta - 2 \sum\limits_{t \in T_{k}} \sum\limits_{i=1}^{N} y_{t,i} \log\left( \hat{\lambda}_{k,i} \right) = \beta - 2 \sum\limits_{i=1}^{N} n_{k,i} \log\left( \hat{\lambda}_{k,i} \right) $$
where
$$ \hat{\lambda}_{k,j} = \frac{ \sum\limits_{t \in T_{k}} y_{t,j} } { \sum\limits_{i=1}^{N} \sum\limits_{t \in T_{k}} y_{t,i} } = \frac{n_{k,j}}{n_{k}} $$
An anomalous region is created when
$$ C_{A}\left(y_{t \in T_{k}} \left| \lambda_{k} \right.\right) - C_{B}\left(y_{t \in T_{k}} \left| p \right.\right) = \beta - 2 \sum\limits_{i=1}^{N} n_{k,i} \left( \log\left( n_{k,i} \right) - \log\left( n_{k} p_{i} \right) \right) <0 $$
In the case of a poitn anomaly nk = nk, j = 1 giving
CA(yt ∈ Tk|λk) − CB(yt ∈ Tk|p) = β + 2log (pj)