Unexpected results in MAD when predicting
Created by: Quentin62
In the _mad
function, the decision score is computed with:
https://github.com/yzhao062/pyod/blob/master/pyod/models/mad.py#L129-L130
diff = np.abs(obs - self.median)
return np.nan_to_num(np.ravel(0.6745 * diff / np.median(diff)))
This function is used for both fit and predict.
Here, the denominator is the median of the difference between the given observations and the median.
The problem is that in the case of a prediction np.median(diff)
uses the current observations and not the ones used for fitting and this can leads to wrong score.
For example, if you use decision_function
with one observation, the output score will always be 0.6745 because in this case diff == np.median(diff).
from pyod.models.mad import MAD
import numpy as np
mod = MAD(threshold=3)
x = np.random.normal(size=100).reshape(-1, 1)
mod.fit(x)
mod.median
y = np.array([[1000]]) # obviously an anomaly
mod.decision_function(y) # array([0.6745])
mod.predict(y) # array([0])
Idea to solve the problem: Saving the fitted median diff in the sae way the median is saved:
diff = np.abs(obs - self.median)
self.mediandiff = np.median(diff) if self.mediandiff is None else self.mediandiff
return np.nan_to_num(np.ravel(0.6745 * diff / self.mediandiff))