feat: limit message context sent to llm #66
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
In an ideal world, we send as much context to the LLM as possible so that it has the most amount of information to help respond to your questions/tasks. Unfortunately token costs grow quadratically for every message added to the chat, since you need to send all previous messages each time.
Solution
To prevent token costs from blowing out of proportion, we can limit the max number of previous messages to send to the LLM as context each request.
Important: this does not mean the actual message history is limited on the frontend. This is strictly referring to the sliding window of the past X messages being sent to the LLM as context each time.
This PR sets this message context limit to 30 messages. The consequence of this change is that the model will have no memory of messages older than 30 messages, so will be unable to answer a questions or refer to information from more than 30 messages back.
30 messages was chosen as a reasonable amount of context to help the user accomplish whatever task they are working on in that moment, but not be too much that irrelevant/stale messages are sent every time (adding large costs) with little value.
Future
In the future, we can consider summarizing old messages before trimming the context so that the model at least has some history to refer to. Rewriting message history can get tricky though, so this will need more thought.