All Submissions Basics:
Closes #92 (closed)
-
Have you followed the guidelines in our Contributing document? -
Have you checked to ensure there aren't other open Pull Requests for the same update/change? -
Have you checked all Issues to tie the PR to a specific one?
All Submissions Cores:
-
Have you added an explanation of what your changes do and why you'd like us to include them? -
Have you written new tests for your core changes, as applicable? -
Have you successfully ran tests with your changes locally? -
Does your submission pass tests, including CircleCI, Travis CI, and AppVeyor? -
Does your submission have appropriate code coverage? The cutoff threshold is 95% by Coversall.
New Model Submissions:
-
Have you created a .py in ~/pyod/models/? -
Have you created a _example.py in ~/examples/? -
Have you created a test_.py in ~/pyod/test/? -
Have you lint your code locally prior to submission?
Linear Method for Deviation Detection for Large Databases
Based on the work of Arning, A., Agrawal, R., and Raghavan, P. 1996. A linear method for deviation detection in large databases. In Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD), Portland, OR. ; The Linear Method for Deviation-based Outlier Detection (LMDD) employs the concept of the Smoothing Factor (SF) which indicates how much the dissimilarity can be reduced by removing a subset of elements from the data-set.
The dissimilarity function can be any as per mentioned clearly in the paper. The one proposed in the paper is the variance, however, more options can be used from the Statistical Dispersion Measures. (Already implemented Average Absolute Deviation; Variance; and Interquartile Range, However, Median Absolute Deviation to be added in future once Scipy Stats Version 1.3.0 is released).
The original algorithm outputs Labels, with a very minor tweak, it can output now Scores.
Side-Note: aad
Dissimilarity Measure is giving better results in the simulations compared to var
and even faster in execution. I wonder if the authors of the paper tried all possible measures :D