Closed
Description
I am attempting to train random forests on a fixed 6GB data set of features, with N
different labels, using M
different random forest parameter settings. Overwhelmingly, the time taken to do this appears to be the disk transpose operation, which occurs N * M
times, when ideally it should only be done once (as the feature set is common to all models).
To rectify this, is there any way to either:
- train multiple random forests in the same pipeline, or,
- share the transposed data object between multiple training pipelines?