Skip to content

Training multiple random forests on common data set #256

Closed
@mjmckp

Description

@mjmckp

I am attempting to train random forests on a fixed 6GB data set of features, with N different labels, using M different random forest parameter settings. Overwhelmingly, the time taken to do this appears to be the disk transpose operation, which occurs N * M times, when ideally it should only be done once (as the feature set is common to all models).

To rectify this, is there any way to either:

  • train multiple random forests in the same pipeline, or,
  • share the transposed data object between multiple training pipelines?

Metadata

Metadata

Assignees

Labels

perfPerformance and Benchmarking relatedquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions