perf(adapter): temporarily hash large data when formatting chat prompt for speed+memory performance #8821

jmhb0 · 2025-09-18T01:04:00Z

Summary

Replaces multi‑MB inline payloads (e.g., base64 images) with small hash strings during adapter formatting (few‑shot demos and history), restoring the original data right before sending to the LM.
This is much faster for requests with many images (~95% faster in this experiment, but it depends on size and number of images). It also uses less memory.
If the maintainers agree with the PR, then I can extend it to the other larger dspy Types like Audio and Document

Problem

Building prompts in Adapter is slow when using large data types like dspy.Image. This gist is a realistic reproducible example: a (1000,1000) image, with 15 few-shot images. It profiles running a prediction for 100 images after warming up the cache:

Total processing time: 53.33 seconds
Examples processed: 100
Average time per example: 0.5333 seconds
Throughput: 1.88 examples/second

Peak RSS memory usage: 6.466 GB
Average RSS memory usage: 3.564 GB
Memory samples collected: 18

Function-level profiling results:
--------------------------------------------------------------------------------
Top 20 functions by TOTAL TIME (self time - actual CPU work):
         1282039 function calls (1208276 primitive calls) in 53.330 seconds

   Ordered by: internal time
   List reduced from 625 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100   32.564    0.326   36.929    0.369 /Users/jamesburgess/dspy/dspy/adapters/types/base_type.py:104(split_message_content_for_custom_types)
     1600   12.177    0.008   12.177    0.008 /Users/jamesburgess/miniconda3/envs/dspy-dev/lib/python3.11/json/encoder.py:205(iterencode)
     1700    4.261    0.003    4.262    0.003 /Users/jamesburgess/miniconda3/envs/dspy-dev/lib/python3.11/json/decoder.py:343(raw_decode)
      100    2.300    0.023    2.300    0.023 {built-in method _hashlib.openssl_sha256}
      100    0.699    0.007    0.699    0.007 {orjson.dumps}
     6100    0.388    0.000    0.388    0.000 {method 'strip' of 'str' objects}
        4    0.154    0.038    0.154    0.038 {built-in method gc.collect}

Building 100 prompts takes 53 seconds, which is very slow for just making a prompt. This is especially noticeable when trying to rerun a cached prediction with a few hundred samples. E.g. in my workflow, I do this as part of dataset preprocessing.

There's also a memory issue: in the Gist, if you look at the example_RSS_increase.log file, you'll also see that RSS memory also grows steadily, from 1.192GB to 6.5GB over 100 samples. For a different (real) experiment, this issue eventually led to MemoryError (however the behaviour here did depend on the system: memory kept increasing on one server, but RSS was reclaimed on my laptop, so I'm less sure what's going on here).

Cause

In adapters.base.Adapter, the format function builds a prompt as one big string. So it stringifies large data (dspy types like Image, Audio) and sticks it to the rest of the prompt. Since special data types are sent to the LM as a separate item in the messages list, the method searches the string to extract those data - that's the split_message_content_for_custom_types function that's taking all the runtime. This is slow because the strings are so big.

Also, there is high memory demand because each large Image string is copied and concatted together to make the large prompt string.

Solution

Temporarily convert large data to a hash before doing the formatting. Then after the messages is build, replace the hash with the real data.

Key Changes

dspy/adapters/utils.py
- Introduce a LargePayloadHashManager manager to be used in Adapter.format(). It has one func for replacing large objects inside inputs and demos with hashes; and one func for restoring the final messages with the full data.
dspy/adapters/base.py
- Use the payload manager in format
dspy/adapters/types/image.py
- ensure that encode_image() accepts hash identifiers and returns them unchanged. This is required to avoid misclassifying hash tokens as invalid inputs.

results: faster performance

Top 20 functions by TOTAL TIME (self time - actual CPU work):
         1478335 function calls (1369172 primitive calls) in 3.349 seconds

   Ordered by: internal time
   List reduced from 642 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    2.258    0.023    2.258    0.023 {built-in method _hashlib.openssl_sha256}
      100    0.606    0.006    0.606    0.006 {orjson.dumps}
        4    0.120    0.030    0.120    0.030 {built-in method gc.collect}
72400/1800    0.031    0.000    0.062    0.000 /Users/jamesburgess/miniconda3/envs/dspy-dev/lib/python3.11/copy.py:128(deepcopy)
      100    0.020    0.000    2.885    0.029 /Users/jamesburgess/dspy/dspy/clients/cache.py:65(cache_key)
     2103    0.016    0.000    0.017    0.000 {built-in method builtins.next}

Runtime reduced from 53.3 seconds to 3.4 seconds, so 94% faster, though this will change as the number and size of images change.

(BTW, the major remaining bottleneck is now from cache.py, which runs dumps() on the whole request and hashes it)

Extensions

If maintainers agree with the PR, I can also extend this to other large dspy Types like Document and Audio.
Probably optimizations could be done for cache key optimizations.

…t for speed+memory performance

jmhb0 added 5 commits September 17, 2025 16:13

perf(adapter): temporarily hash large data when formatting chat promp…

0a0ba42

…t for speed+memory performance

fix: add missing imports after refactor

b942e1a

restore comments that should not have been changed

cde824e

chore: remove unused import

7016ea3

comment formatting

336d36d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(adapter): temporarily hash large data when formatting chat prompt for speed+memory performance #8821

perf(adapter): temporarily hash large data when formatting chat prompt for speed+memory performance #8821

Uh oh!

jmhb0 commented Sep 18, 2025

Uh oh!

Uh oh!

perf(adapter): temporarily hash large data when formatting chat prompt for speed+memory performance #8821

Are you sure you want to change the base?

perf(adapter): temporarily hash large data when formatting chat prompt for speed+memory performance #8821

Uh oh!

Conversation

jmhb0 commented Sep 18, 2025

Summary

Problem

Cause

Solution

results: faster performance

Extensions

Uh oh!

Uh oh!