-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
🚀 Describe the new functionality needed
Today the /v1/models endpoint lists all the models discovered at the provider, as well as any models explicitly configured in the run.yaml. While this makes for an easy "getting started" experience, it makes it harder for client implementations that want to rely on backend infrastructure to tell them which models should be used based on corporate policy or other requirements.
I think it would be helpful to have a flag to enable/disable including the auto-discovered models in the models list api. I don't have an opinion on whether the default should be on or off, but defaulting it on would allow us to preserve existing behavior.
another option could be to introduce a second api, or a parameter on the existing api, to control whether you get "all the models" or "explicitly configured" models.
💡 Why is this needed? What if we don't build it?
Without this feature, any client that talks to a llamastack instance needs to have its own config+filtering logic to ensure it is only using the intended models and not other models that might be discovered at a provider. In some cases this might be sensible, but in other cases requiring every client implementation to reimplement that logic (and ensure those configurations are consistent across client deployments) is more effort than allowing an admin to configure the llamastack instance w/ the models that should be available.
And since provider/model configuration must be done in llamastack anyway (to supply credentials, endpoints, etc), splitting+duplicating the model/provider config between llamastack and clients feels like a worse experience than being able to control it all centrally in the llamastack config.
Other thoughts
No response