perf(adapter): temporarily hash large data when formatting chat prompt for speed+memory performance #8821
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Problem
Building prompts in
Adapter
is slow when using large data types likedspy.Image
. This gist is a realistic reproducible example: a (1000,1000) image, with 15 few-shot images. It profiles running a prediction for 100 images after warming up the cache:Building 100 prompts takes 53 seconds, which is very slow for just making a prompt. This is especially noticeable when trying to rerun a cached prediction with a few hundred samples. E.g. in my workflow, I do this as part of dataset preprocessing.
There's also a memory issue: in the Gist, if you look at the
example_RSS_increase.log
file, you'll also see that RSS memory also grows steadily, from 1.192GB to 6.5GB over 100 samples. For a different (real) experiment, this issue eventually led to MemoryError (however the behaviour here did depend on the system: memory kept increasing on one server, but RSS was reclaimed on my laptop, so I'm less sure what's going on here).Cause
In
adapters.base.Adapter
, theformat
function builds a prompt as one big string. So it stringifies large data (dspy types like Image, Audio) and sticks it to the rest of the prompt. Since special data types are sent to the LM as a separate item in themessages
list, the method searches the string to extract those data - that's thesplit_message_content_for_custom_types
function that's taking all the runtime. This is slow because the strings are so big.Also, there is high memory demand because each large Image string is copied and concatted together to make the large prompt string.
Solution
Temporarily convert large data to a hash before doing the formatting. Then after the
messages
is build, replace the hash with the real data.Key Changes
- Introduce a LargePayloadHashManager manager to be used in Adapter.format(). It has one func for replacing large objects inside
inputs
anddemos
with hashes; and one func for restoring the finalmessages
with the full data.- Use the payload manager in
format
- ensure that encode_image() accepts hash identifiers and returns them unchanged. This is required to avoid misclassifying hash tokens as invalid inputs.
results: faster performance
Runtime reduced from 53.3 seconds to 3.4 seconds, so 94% faster, though this will change as the number and size of images change.
(BTW, the major remaining bottleneck is now from
cache.py
, which runsdumps()
on the whole request and hashes it)Extensions