-
Notifications
You must be signed in to change notification settings - Fork 1.1k
One chat prompt/template should be customizable from runtime - the prompt I need atm #816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What if multiple users are using the server? Does it mean everyone gets to modify the same shared runtime prompt template, or does the server have to keep a different prompt template for each user? |
If that is a problem, perhaps one could limit usage by a feature flag. This would make sense especially since I suspect that when a model hits production a static version would have been implemented. Do you agree? |
Looks like you can build upon or contribute to this similar effort: |
@pervrosen do you know what is the behavior of --chat_format on /v1/completion endpoint right now? It defaults to "llama-2". Does that mean every prompt sent to the endpoint is automatically applied llama-2 prompt format? |
feels like the "chat_format" parametaer could just be jinja template. then just execute it. user can specify whatever . for example this is "mistral openorca"
if everyone is cool with that i will add it (i will detect a jinja template as the chat format param, and execute it ... while still supporting the python registered formats as is) then we can add that to gguf files in the metadata (absorb it during convert.py), and be done with having to think about templates! |
@earonesty I think this is exactly what they are discussing in #809 (link in an above comment). Perhaps you can contribute on the choice of arcitectural pattern that seems to be under debate there. |
Is your feature request related to a problem? Please describe.
Since the speedy evolement of LLMs and their various use of prompts, we need a way to specify a prompt from runtime, not only by code, eg mistral->mistral+orca->mistral-zephyr within a week
Describe the solution you'd like
An endpoint to override a prompt or add a runtime prompt for the model I am currently evaluating
Describe alternatives you've considered
submitting a pull request for each end ever LLM I encounter OR polling the "huggingface.co/docs/transformers/main/en/chat_templating"-feature for the model in question, but that introduces one more dependency
Additional context
An example: First, I run everything in a docker/k8s setup. I call upon an endpoint to specify the prompt, if not already specified in code. That populates a runtime version of
´´´
@register_chat_format("mistral")
def format_mistral(
messages: List[llama_types.ChatCompletionRequestMessage],
**kwargs: Any,
) -> ChatFormatterResponse:
_roles = dict(user="[INST] ", assistant="[/INST]")
_sep = " "
system_template = """{system_message}"""
system_message = _get_system_message(messages)
system_message = system_template.format(system_message=system_message)
_messages = _map_roles(messages, _roles)
_messages.append((_roles["assistant"], None))
_prompt = _format_no_colon_single(system_message, _messages, _sep)
return ChatFormatterResponse(prompt=_prompt)
´´´
that I can use for calling the model.
The text was updated successfully, but these errors were encountered: