This PR Parallelize the fit
and decision_function
methods of FeatureBagging
. The earlier implementation only used the n_jobs
when base_estimator
parameter is None. Apart from fixing that, the model level PR enables parallelism at more coarser level, thereby noticeably improving performance.
Benchmark results using n_estimators=20
and base_estimator=None
, averaged over 3 runs. Values indicate fit
time in seconds, the one inside bracket denote time for decision_function
:
Dataset (shape) | Orig (n_jobs=1) | Orig (n_jobs=4) | This PR (n_jobs=4) |
---|---|---|---|
pima (768, 8) | 0.19 (0.094) | 2.30 (2.155) | 0.64 (0.63) |
vowels (1456, 12) | 0.71 (0.42) | 2.36 (2.17) | 0.66 (0.64) |
pendigits (6870, 16) | 9.12 (5.02) | 5.87 (4.32) | 1.78 (1.42) |
musk (3062, 166) | 18.92 (8.32) | 7.46 (5.88) | 3.90 (2.79) |
shuttle (49097, 9) | 59.09 (38.67) | 46.10 (28.11) | 33.43 (18.01) |
Performance can be slightly worse than single-process method for smaller datasets, but I think that is expected.
Please let me know if further changes are needed. Thanks.