Hey everyone,
as stated in #466 and in #453, one can speed up the empirical cumulative density function in comparison to the Statsmodels ECDF functionality.
This also makes the dependency on statsmodels obsolete and this pull request deletes the dependency.
In this pull request the following things are done:
- Implementing an standalone ecdf estimator in pyod/utils/stat_models.py
- Writing a test that compares own implementation to statsmodels implementation on several random matrices (so in the requirements_ci.txt statsmodels is still a requirement)
- Deleting and replacing the functionality in ECOD and COPOD (the only places this dependency has been used
The implementation is now faster (by 30-60%), as we will only use the ecdf for the data we estimate it from. Please get back to me if a further explanation of why exactly is necessary. I will gladly elaborate more.
Since not anyone might want to fully submerge in the topic, I kept the statsmodels dependency in the test and compare this implementation to the statsmodels function on several random matrices. One could see that as prove that it works.
Thanks in advance! :-)
All Submissions Basics:
-
Have you followed the guidelines in our Contributing document? -
Have you checked to ensure there aren't other open Pull Requests for the same update/change? -
Have you checked all Issues to tie the PR to a specific one?
All Submissions Cores:
-
Have you added an explanation of what your changes do and why you'd like us to include them? -
Have you written new tests for your core changes, as applicable? -
Have you successfully ran tests with your changes locally? -
Does your submission pass tests, including CircleCI, Travis CI, and AppVeyor? -
Does your submission have appropriate code coverage? The cutoff threshold is 95% by Coversall.