ECOD implementation does not correspond to what is described in the paper (and as it stands the skewness part does nothing here because of it)
Summary:
Looking at the implementation of decision_function (same for the parallel version) in ecod.py, this is not what is described in the paper (https://arxiv.org/abs/2201.00382).
Furthermore, as a side effect of this implementation, the score obtained when considering the skewness does absolutely nothing ever aka:
self.O = np.maximum(self.U_l, self.U_r) self.O = np.maximum(self.U_skew , self.O)
will always give the same as just (one can always comment # self.O = np.maximum(self.U_skew , self.O) and see without even having to think about it that I am right)
self.O = np.maximum(self.U_l, self.U_r)
This is a big problem in a sense that we are doing something for nothing. The method as is can be useful, but it is not ECOD as described in the paper.
Explanation:
Here is a in depth explanation:
This correctly calculates the negative log of the cdf left, right and skew for each feature of each data point. However, the next operation will choose for each feature of each datapoint which of the left or right score is maximum for that feature and data point and forget the other one.
This is not what is described in the paper. Here (line 153) we are left with a matrix where each element is either the max left score or max right score for each feature. There is no notion of max at the feature level in the paper as far as I can tell, it is at the data point level. Then the next line of code does not do anything at all.
Indeed, self.U_skew as calculated (see line 150 and 151 above) is a matrix where each element is either the feature left or right score depending on the skewness sign of that feature. But each element of self.O is already the max of either the left or right score of that feature and thus taking the max again with either left or right score from skew just give the same answer, we already have the max for that feature. Therefore, this way of implementing the score eliminates any effect of the skew score, which I thought was one of the main points of the paper.
Finally, the sum for each feature is taken aka for a row (data point), the sum is a mix of left and right score as each feature has different max:
This is not what is described in Eqs. 4, 5 and 6 of the paper.
In conclusion:
PYOD:
The implementation here is: Find a left and right score for each feature of a data point, keep only the max one for each feature and then sum these scores together. This is kind of inverting additions and max operations as compared to the paper.
PAPER In the paper it is rather: Find a left, right and skew (left or right depending on sign of skewness) score for each feature of a data point. Independently sum these scores over the features, left with left, right with right and skew with skew. Choose the max of these three values for each datapoint. This is very different than what is implemented.
Discussion:
Seems that the way to implement what is in the paper if one wants that algorithm would be to replace the code of lines 153 to 159 with something like:
We sum across features independently left, right, skew
self.O_left = self.U_l.sum(axis=1) self.O_right = self.U_r.sum(axis=1) self.O_skew = self.U_skew.sum(axis=1)
For each row we keep the max value of score between left, right and skew
O_tmp = np.concatenate((self.O_left.reshape(-1, 1), self.O_right.reshape(-1, 1), self.O_skew.reshape(-1, 1)), axis=1)
Self.O = np.max(O_tmp, axis=1)
Decision score is just self.O
If hasattr(self, ‘X_train’): Decision_scores_ = self.O[-original_size:] else: Decision_scores_ = self.O