All Submissions Basics:
-
Have you followed the guidelines in our Contributing document? -
Have you checked to ensure there aren't other open Pull Requests for the same update/change? -
Have you checked all Issues to tie the PR to a specific one?
All Submissions Cores:
-
Have you added an explanation of what your changes do and why you'd like us to include them? -
Have you written new tests for your core changes, as applicable? -
Have you successfully ran tests with your changes locally? -
Does your submission pass tests, including CircleCI, Travis CI, and AppVeyor? -
Does your submission have appropriate code coverage? The cutoff threshold is 95% by Coversall.
New Model Submissions:
-
Have you created a .py in ~/pyod/models/? -
Have you created a _example.py in ~/examples/? -
Have you created a test_.py in ~/pyod/test/? -
Have you lint your code locally prior to submission?
Brief Description
This algorithm generates one (or many) clusters of data points with different/same sizes and densities based on the user's choice passed by the parameters.
It generates the required ratio of outliers controlled by the contamination
parameter and distributes them on the clusters.
It avails of the make_blobs
function provided by sklearn
to create the clusters; and main part of the algorithm is to maintain and validate the consistency of the data splits among the different clusters.
It is very well documented, and I believe if you read the documentation (comments) you will get it easily.
As per previously mentioned in the related issue #66 , having different clusters of data with different sizes and densities makes outliers detection challengeable especially for those type of algorithms that based on k-nearest neighbors such as LOF
, LDOF
, LoOP
, HiCS
and SOD
and others.