Skip to content

[FT] Enable lazy model initialization #496

Open
@JoelNiklaus

Description

@JoelNiklaus

Issue encountered

Evaluating large models (> 30B parameters) is hard, especially with limited hardware. Especially when there are many metrics to be evaluated, it can significantly increase the time needed for the strong machine to be running. For example when I want to evaluate a 70B model on a large dataset and then compute many LLM-judge metrics, it can occupy a 4xA100 machine for days, incurring significant cost. The GPUs are only actually active during the first few hours for inference. Afterwards they are just sitting idle.

Solution/Feature

Therefore, ideally, we would like to just run inference with one metric and save the results to the details files. Then in a second step we load the responses from the details files and just run the metrics. For that we can use a significantly smaller machine. Loading from the details files is being added in PR #488. However, to evaluate the metrics, we still need to load the entire model into memory, defeating the purpose. Loading the model only right before running it would alleviate this issue.

Possible alternatives

Alternatively, we could mock the model.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions