llama : refactor model loading code

In `llama.cpp` we have logic for supporting some very old model formats and features such as sharded models which is making the code unnecessary complicated and difficult to maintain. We should simplify it and remove support for old stuff that is no longer used.

Additionally, with the upcoming unified file format (https://github.com/ggerganov/ggml/issues/220) we will have to look into reimplementing the code to use it and add support for loading non-LLaMA models as well. This will be an important step towards adding inference of new models such as MPT and Falcon. Therefore, simplifying the logic as much as possible will help to easily adopt the new unified file format when it is ready

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : refactor model loading code #1991

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

llama : refactor model loading code #1991

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions