Cook's distance outlier detector
A supervised regression outlier detector
Cook's distance can be used to identify points that negatively affect a regression model. A combination of each observation’s leverage and residual values are used in the measurement. Higher leverage and residuals relate to higher Cook’s distances. Read more in the :cite:cook1977outlier --> https://www.jstor.org/stable/1268249
The script cd.py has been added to pyod/models/ containing the Cook's distance outlier detector. The code is mostly based off what has been implemented in the Yellowbrick repo but thought it would be nice to be able to call it with all the others outlier detectors in Pyod.
An example and test script has now also been added as well as the original Cook's distance outlier detector script being simplified. However, due to the way that the Cook's distance is calculated, the target variable y
is necessary for both the train and test data. The decision function has been rewritten to take X
as an appended array of [X,y]
(see example script). Still, because is this fit not the test script fit_predict
and fit_predict_score
will not run without issues. But I see that both these functions are depreciated anyway so I hope this is not a deal breaker since the results from this outlier detector are relatively good. If you think that it should be written that both for fit
and decision_function
should have only X
as an input, I can rewrite that but the user will have to append the X
and y
data prior to running either call.
Hopefully this will become an useful addition to the already great repo and python package that Pyod is.