Improve performance of output cache #48328

mgravell · 2023-05-19T11:08:22Z

Improve performance of output cache

Reduce allocations and CPU impact of output cache by making changes to the internal implementation (numbers and evidence at bottom)

Description

The output cache feature is complex; to understand the changes, first we need to discuss the implementation:

the middleware adds a OutputCacheContext / feature
this attempts to fetch an appropriate OutputCacheEntry to use for the existing payload
- normally, it gets this via a byte[] returned from the IOutputCacheStore, which it loads into a MemoryStream with BinaryReader, and manually deserializes into a FormatterEntry (parsing the headers into strings, etc - note that the original segments are reconstructed into new byte[] copies of the data)
- FormatterEntry has a headers Dictionary<string, string?[]>, a string[] tags , and a list of byte[] buffers
- it then rewrites this into an OutputCacheEntry which has a HeaderDictionary?, the same string[], and a CachedResponseBody (which wraps a List<byte[]>)
if it is unable to get a matching entry, the output stream is shimmed via an OutputCacheStream which writes to the underlying stream but also duplicates the data to a SegmentWriteStream (which wraps a List<byte[]>) and uses this to write a new OutputCacheEntry
- it may then need to serialize this, so if forms a FormatterEntry from the composed OutputCacheEntry, writes this via MemoryStream and BinaryWriter to a byte[], which it passes to the IOutputCacheStore

Key observations:

most of the types cited are internal; we have a lot of scope for change
all of the byte[] mentioned are dropped on the floor
the only time we actually need a right-sized byte[] is fore the IOutputCacheStore API; this will be addressed separately
FormatterEntry and OutputCacheEntry are basically the same thing
lots of intermediate wrapper objects - List<T>, BinaryReader, BinaryWriter, MemoryStream
the List<T> and MemoryStream use array-doubling internally, so may involve multiple array allocations behind the scenes
BinaryReader / BinaryWriter may use their own internal buffers
the multiple dictionaries and arrays for headers are redundant; this is pure storage - we can use simpler constructs

Work items:

merge `FormatterEntry` and `OutputCacheEntry` into a simpler store

we throw away FormatterEntry, and simplify OutputCacheEntry; we throw away the tags completely (they are not needed in the storage model), use ReadOnlyMemory<(string, StringValues)> for the headers (allowing us to use oversized leased arrays), and ReadOnlySequence<byte> for the body payload (deleting CachedResponseBody)

for the very few places where headers on the OutputCacheEntry are inspected, we add a TryFindHeader API; this is incredibly rare, so a few O(N) scans will be much more efficient than paying the same O(N) cost to build a dictionary (plus additional storage), for only a few O(1) fetches

introduce a custom recyclable sequence API

the new RecyclableReadOnlySequenceSegment : ReadOnlySequenceSegment<byte> allows ReadOnlySequence<byte> to be constructed cheaply, using two definitions of recycling:

a small number of segments is kept via a pool, allowing chain details to be recycled
optionally, recycling a chain may also attempt to detect buffers that can be recycled

implement a v2 serialization format

fortunately the existing format included version preamble, so we can add a v2 effortlessly; key differences in v2:

instead of storing segment subdata, just store the payload length, and write a single contiguous body payload
test header name/values against a hard-coded set of known values; if found, store an integer key rather than the length-prefixed UTF8 data

implement custom reader/writer for the serializer

specifically, we introduce new ref struct reader/writer that are wire-compatible with BinaryReader/BinaryWriter, but minimal; the reader works directly on the byte[] returned from IOutputCacheStore (later we may want to make this work against ReadOnlySequence<byte> for a revised API, but we can defer this), using oversized leased arrays for the headers; if parsing pre-existing v1 data, the body uses the leased segment chain as mentioned above but pointing directly at the original byte[] data (which is never reused, so: safe); for v2 data, the body uses a single-segment (no chain) sequence of the payload bytes

the writer targets an IBufferWriter<byte>, and in particular a new RecyclableArrayBufferWriter<byte> which uses leased arrays (using the single-array model as per ArrayBufferWriter<byte>); we still need a byte[] ToArray() API to satisfy the IOutputCacheStore demands (we can address that later)

change the output capture to use sequence chains

we change OutputCacheStream to use a new RecyclableSequenceBuilder instead of SegmentWriteStream (which was in "shared", and is now removed), which has similar Write APIs, but no longer pretends to be a Stream, and uses RecyclableReadOnlySequenceSegment and leased buffers for the backing store; the DetachAndReset() API hands back a ReadOnlySequence<byte> of the captured payload

recycle the cache-entry as needed

in the middleware, we use a new ReleaseCachedResponse() API that cleans up and recycles (if necessary) the OutputCacheEntry; this works similarly regardless of whether the current request built the data, or whether it was materialized; to help with this, we decorate the cached response as nullable to formalize that the state could be empty, and fixup call paths

intentionally, we only do this in "known good" code paths - we don't use finally / using to avoid timing problems with async exceptions (which could still have sight of the leased buffers)

add new performance profiling project

(and backport the same to the original code)

before vs after

key observations:

write path before could allocate ~5x the payload size; now we allocate ~1x, which is explained by the need of the byte[] for the IOutputCacheStore API
read path before allocated ~1x the payload size; now no lost overheads
performance is improved in all scenarios, but especially on the read path which is now O(1) thanks to (at least for the in-process scenario) using zero-copy for the main payload

src/Middleware/OutputCaching/src/FormatterBinaryReader.cs

src/Middleware/OutputCaching/src/FormatterBinaryWriter.cs

src/Middleware/OutputCaching/src/OutputCacheEntryFormatter.cs

src/Middleware/OutputCaching/src/RecyclableReadOnlySequenceSegment.cs

src/Middleware/OutputCaching/src/RecyclableSequenceBuilder.cs

adityamandaleeka · 2023-05-19T21:29:27Z

cc @sebastienros

sebastienros · 2023-05-19T21:30:30Z

Let's build a benchmark ;)

BrennanConroy

Couple high level nits, haven't looked much at the actual code yet

src/Middleware/OutputCaching/perf/Microsoft.AspNetCore.OutputCaching.Performance.csproj

src/Middleware/OutputCaching/perf/Program.cs

src/Middleware/OutputCaching/src/Global.cs

mgravell · 2023-05-23T07:52:26Z

@gfoidl @BrennanConroy nits resolved - great feedback, thanks!

@sebastienros totally agree we should add a crank profile that stresses output-cache (ideally with both small and large payloads - maybe query-string length?), and get that merged before this (so we can compare with/without) - I'll take a stab at that, but I might be looking for some of your input there

BrennanConroy

The removal of "Tags" is unclear to me.

BrennanConroy · 2023-05-24T21:55:46Z

...OutputCaching/perf/Microbenchmarks/Microsoft.AspNetCore.OutputCaching.Microbenchmarks.csproj

+  <ItemGroup>
+    <Reference Include="Microsoft.AspNetCore.OutputCaching" />
+    <Reference Include="BenchmarkDotNet" />
+    <Reference Remove="Microsoft.CodeAnalysis.PublicApiAnalyzers" />


What is this for?

the project? investigating the overheads; I could, however, support burning this now if it is ugly to maintain

No, this specific line
<Reference Remove="Microsoft.CodeAnalysis.PublicApiAnalyzers" />

investigating the overheads;

Can you clarify what you mean here? I would think that it's not necessary to include the PublicApi analyzers on a benchmark app where we are not shipping an API.

EDIT: Nevermind, I see this is actually disabling the analyzers.

yeah, exactly that; I didn't want warning noise about public API in something that isn't public API - this might have originally been an incorrect path thing, but... it doesn't do anything painful

src/Middleware/OutputCaching/src/OutputCacheEntry.cs

src/Middleware/OutputCaching/src/OutputCacheEntryFormatter.cs

BrennanConroy · 2023-05-25T21:29:37Z

src/Middleware/OutputCaching/src/OutputCacheEntryFormatter.cs


-        result.Tags = tags;
-        return result;
+    static readonly string[] CommonHeaders = new string[]


@Tratcher probably has opinions about this

src/Middleware/OutputCaching/src/OutputCacheMiddleware.cs

src/Middleware/OutputCaching/test/OutputCacheEntryFormatterTests.cs

src/Middleware/OutputCaching/src/FormatterBinaryReader.cs

src/Middleware/OutputCaching/src/FormatterBinaryWriter.cs

src/Middleware/OutputCaching/src/RecyclableSequenceBuilder.cs

mgravell · 2023-05-27T08:18:47Z

The removal of "Tags" is unclear to me.

@BrennanConroy from the entry? It is never used, and isn't part of the public API, yet by existing it requires deserialization into string each time, so that's allocations and UTF8 decode (plus an allocation/lease for the collection to put them in), every time, for absolutely no reason

ghost · 2023-06-14T03:00:54Z

Looks like this PR hasn't been active for some time and the codebase could have been changed in the meantime.
To make sure no breaking changes are introduced, please leave an /azp run comment here to rerun the CI pipeline and confirm success before merging the change.

src/Middleware/OutputCaching/src/OutputCacheEntryFormatter.cs

mgravell · 2023-07-21T13:48:17Z

(rebased on top of redis output cache changes)

- unify OutputCacheEntry and FormatterEntry - leased buffers for headers, tags, etc (dispose on way out) - use ReadOnlySequence<byte> instead of List<byte[]> with recyclable segments - avoid copying the payload data once fectched - serialization tweak: use common headers (not yet listed) finish porting tests; all good migrate writes to IBufferWriter<byte>, using recyclable array buffer writer (similar to MemoryStream, but: faster) use pooled buffers when buffering output-cache payloads RROSS should be able to recycle buffers as needed implement header name/value lookup buffer add bench project note memory overhead in bench full bench suite add benchmark result - don't store tags in the cache payload - don't store segment details in the cache payload whitespace include test that uses body-writer rather than stream tidy up benchmarks remove pipe impl; we don't need it (proven in bench) add a few more known headers; don't store request-id don't write empty segments fix buffer cleanup paths revert changes to shared SegmentWriteStream update numbers use fixed size stackalloc Update src/Middleware/OutputCaching/src/OutputCacheEntryFormatter.cs Co-authored-by: Günther Foidl <[email protected]> - remove global SkipLocalsInit - add accessibility on all new members - in length checks, use - over + to avoid overflow nits relocate perf to microbenchmarks use standard project structure for micro-benchmarks use "actual" instead of "bytes" to avoid a "pop"/"ld" in the "release" case avoid a memcopy by writing directly to the buffer bytes (and increasing DRY) Update src/Middleware/OutputCaching/src/OutputCacheEntry.cs Co-authored-by: Brennan <[email protected]> Update src/Middleware/OutputCaching/test/OutputCacheEntryFormatterTests.cs Co-authored-by: Brennan <[email protected]> fix PR nits from #48450 fix sln (dead project) Update src/Middleware/OutputCaching/src/OutputCacheEntry.cs Co-authored-by: Brennan <[email protected]> DRY nit moar nits use leased buffer for tags when calling SetAsync Update src/Middleware/OutputCaching/src/OutputCacheEntryFormatter.cs Co-authored-by: Brennan <[email protected]> use [LoggerMessage] for logging merge submodule delete bump optimize Output Cache; no API changes yet - all internal: - unify OutputCacheEntry and FormatterEntry - leased buffers for headers, tags, etc (dispose on way out) - use ReadOnlySequence<byte> instead of List<byte[]> with recyclable segments - avoid copying the payload data once fectched - serialization tweak: use common headers (not yet listed) sln fix dammit

src/Middleware/OutputCaching/src/FormatterBinaryReader.cs

src/Middleware/OutputCaching/src/IOutputCacheBufferStore.cs

captainsafia · 2023-07-31T16:55:45Z

src/Middleware/OutputCaching/src/CacheEntryHelpers.cs

@@ -20,19 +20,13 @@ internal static long EstimateCachedResponseSize(OutputCacheEntry cachedResponse)
            long size = sizeof(int);

            // Headers
-            if (cachedResponse.Headers != null)
+            foreach (var item in cachedResponse.Headers.Span)


Why is it safe to remove the null-check here?

@captainsafia because it is now a ReadOnlyMemory<T> rather than a HeaderDictionary?

captainsafia · 2023-07-31T16:56:28Z

src/Middleware/OutputCaching/src/CacheEntryHelpers.cs

-            {
-                size += cachedResponse.Body.Length;
-            }
+            size += cachedResponse.Body.Length;


Ditto here. Should we Debug.Assert if we know that Body and Headers are set at this point?

@captainsafia it is now a ReadOnlySequence<byte> - worst case is that it is empty, which doesn't need any extra work

BrennanConroy · 2023-07-12T17:25:16Z

...OutputCaching/perf/Microbenchmarks/Microsoft.AspNetCore.OutputCaching.Microbenchmarks.csproj

+  <ItemGroup>
+    <Reference Include="Microsoft.AspNetCore.OutputCaching" />
+    <Reference Include="BenchmarkDotNet" />
+    <Reference Remove="Microsoft.CodeAnalysis.PublicApiAnalyzers" />


No, this specific line
<Reference Remove="Microsoft.CodeAnalysis.PublicApiAnalyzers" />

BrennanConroy · 2023-07-19T20:26:28Z

src/Middleware/OutputCaching/src/FormatterBinaryReader.cs

+    // additionally, we add support for reading a string with length specified by the caller (rather than handled automatically),
+    // and in-place (zero-copy) BLOB reads
+
+    private readonly ReadOnlyMemory<byte> original; // used to allow us to zero-copy chunks out of the payload


Privates should be _ prefixed. _orginal, _root, etc.

src/Caching/StackExchangeRedis/src/RedisOutputCacheStore.cs

src/Middleware/OutputCaching/src/FormatterBinaryWriter.cs

BrennanConroy · 2023-07-31T17:27:06Z

src/Middleware/OutputCaching/src/FormatterBinaryWriter.cs

+    private void RequestNewBuffer()
+    {
+        Flush();
+        var span = target.GetSpan(1024);


Any reason to use 1024 specifically? Why even pass a value in?

IMO "arbitrary size that we've at least hinted" > "arbitrary size that is implementation specific; we don't want trivially small - ultimately any implementation can ignore us, but: this seems a reasonable size to not be constantly swapping buffers; you're right that there's nothing special about the number; want me to add a // fairly arbitrary non-trivial size ?

ultimately any implementation can ignore us

Well, ignore the lower bound, they should always give a buffer >= the value passed in.

Sure, a comment is fine. It'd be interesting to see if real world examples could benefit from a larger buffer. 😃

BrennanConroy · 2023-07-31T17:52:54Z

src/Middleware/OutputCaching/src/OutputCacheMiddleware.cs

@@ -110,8 +108,12 @@ private async Task InvokeAwaited(HttpContext httpContext, IReadOnlyList<IOutputC
                // Can this request be served from cache?
                if (context.AllowCacheLookup)
                {
-                    if (await TryServeFromCacheAsync(context, policies))
+                    bool served = await TryServeFromCacheAsync(context, policies);
+                    context.ReleaseCachedResponse(); // release even if not served due to failing conditions


Should this be put in the finally instead of sprinkling it everywhere?

the lifetime is a bit more complex than that - the cached response is changed in a few spots; in all the return cases you may be right, but the one you highlight here: can get overstomped again later, validly; I was also trying to not recycle buffers in any "cancellation token abort" scenarios, as I don't want to recycle anything if there's even a remote chance that async code (the output cache implementation) is still touching that buffer - I'd rather drop them on the floor than get competition

that was my logic; happy to add some comments, and maybe a bool releaseResponse for the common return case (so the only thing repeated is releaseResponse = true ?

Maybe I'm missing something, but it looks like every single spot this method exits from, the cached response is released. And if I am missing something and it's more complex than that, there should probably be comments explaining it.

I don't want to recycle anything if there's even a remote chance that async code (the output cache implementation) is still touching that buffer

You can preserve that behavior:

var hasException = false; try { } catch { hasException = true; throw; } finally { if (!hasException) { // release } }

mgravell added feature-middleware area-perf Performance infrastructure issues feature-output-caching labels May 19, 2023

gfoidl reviewed May 19, 2023

View reviewed changes

BrennanConroy reviewed May 19, 2023

View reviewed changes

src/Middleware/OutputCaching/perf/Microsoft.AspNetCore.OutputCaching.Performance.csproj Outdated Show resolved Hide resolved

src/Middleware/OutputCaching/perf/Program.cs Outdated Show resolved Hide resolved

src/Middleware/OutputCaching/src/Global.cs Outdated Show resolved Hide resolved

mgravell marked this pull request as ready for review May 24, 2023 11:05

mgravell requested a review from Tratcher as a code owner May 24, 2023 11:05

BrennanConroy reviewed May 25, 2023

View reviewed changes

mgravell mentioned this pull request May 26, 2023

Implement output-cache store using redis #48450

Merged

adityamandaleeka reviewed May 26, 2023

View reviewed changes

src/Middleware/OutputCaching/src/FormatterBinaryReader.cs Outdated Show resolved Hide resolved

adityamandaleeka reviewed May 26, 2023

View reviewed changes

src/Middleware/OutputCaching/src/FormatterBinaryReader.cs Outdated Show resolved Hide resolved

adityamandaleeka reviewed May 26, 2023

View reviewed changes

src/Middleware/OutputCaching/src/FormatterBinaryWriter.cs Outdated Show resolved Hide resolved

adityamandaleeka reviewed May 26, 2023

View reviewed changes

src/Middleware/OutputCaching/src/RecyclableSequenceBuilder.cs Show resolved Hide resolved

amcasey added the area-middleware Includes: URL rewrite, redirect, response cache/compression, session, and other general middlewares label Jun 2, 2023

ghost added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Jun 14, 2023

martincostello reviewed Jun 26, 2023

View reviewed changes

src/Middleware/OutputCaching/src/OutputCacheEntryFormatter.cs Outdated Show resolved Hide resolved

mgravell requested review from captainsafia and halter73 as code owners June 28, 2023 15:51

mgravell force-pushed the marc/ocbin branch from 46250b3 to 60cac9d Compare July 21, 2023 13:45

mgravell requested review from a team and wtgodbe as code owners July 21, 2023 13:45

mgravell force-pushed the marc/ocbin branch from b3d743d to ff54729 Compare July 24, 2023 10:24

mgravell force-pushed the marc/ocbin branch from beb5e89 to 3678755 Compare July 27, 2023 16:37

ReubenBond reviewed Jul 31, 2023

View reviewed changes

src/Middleware/OutputCaching/src/FormatterBinaryReader.cs Show resolved Hide resolved

ReubenBond reviewed Jul 31, 2023

View reviewed changes

src/Middleware/OutputCaching/src/FormatterBinaryReader.cs Show resolved Hide resolved

ReubenBond reviewed Jul 31, 2023

View reviewed changes

src/Middleware/OutputCaching/src/IOutputCacheBufferStore.cs Show resolved Hide resolved

captainsafia reviewed Jul 31, 2023

View reviewed changes

BrennanConroy reviewed Jul 31, 2023

View reviewed changes

mgravell added 4 commits August 1, 2023 16:50

use _field naming in FormatterBinaryReader.cs

ef090d1

move log methods to dedicated file

3c77768

simplify cached response cleanup

29a5310

add comment re buffer size

ef4df0c

BrennanConroy approved these changes Aug 2, 2023

View reviewed changes

mgravell merged commit da234b9 into main Aug 3, 2023

mgravell deleted the marc/ocbin branch August 3, 2023 15:42

ghost added this to the 8.0-rc1 milestone Aug 3, 2023

BrennanConroy mentioned this pull request Sep 29, 2023

Content-Length mismatch randomly on high throughput when using Output Caching #51009

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of output cache #48328

Improve performance of output cache #48328

mgravell commented May 19, 2023 •

edited

Loading

adityamandaleeka commented May 19, 2023

sebastienros commented May 19, 2023

BrennanConroy left a comment

mgravell commented May 23, 2023

BrennanConroy left a comment

BrennanConroy May 24, 2023

mgravell Jun 28, 2023

BrennanConroy Jul 12, 2023

captainsafia Jul 31, 2023 •

edited

Loading

mgravell Aug 1, 2023

BrennanConroy May 25, 2023

mgravell commented May 27, 2023

ghost commented Jun 14, 2023

mgravell commented Jul 21, 2023

captainsafia Jul 31, 2023

mgravell Aug 1, 2023

captainsafia Jul 31, 2023

mgravell Aug 1, 2023

BrennanConroy Jul 12, 2023

BrennanConroy Jul 19, 2023

mgravell Aug 1, 2023

BrennanConroy Jul 31, 2023

mgravell Aug 1, 2023

BrennanConroy Aug 1, 2023

mgravell Aug 2, 2023

BrennanConroy Jul 31, 2023

mgravell Aug 1, 2023

BrennanConroy Aug 1, 2023

mgravell Aug 2, 2023

Improve performance of output cache #48328

Improve performance of output cache #48328

Conversation

mgravell commented May 19, 2023 • edited Loading

Improve performance of output cache

Description

merge FormatterEntry and OutputCacheEntry into a simpler store

introduce a custom recyclable sequence API

implement a v2 serialization format

implement custom reader/writer for the serializer

change the output capture to use sequence chains

recycle the cache-entry as needed

add new performance profiling project

adityamandaleeka commented May 19, 2023

sebastienros commented May 19, 2023

BrennanConroy left a comment

Choose a reason for hiding this comment

mgravell commented May 23, 2023

BrennanConroy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

captainsafia Jul 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgravell commented May 27, 2023

ghost commented Jun 14, 2023

mgravell commented Jul 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgravell commented May 19, 2023 •

edited

Loading

merge `FormatterEntry` and `OutputCacheEntry` into a simpler store

captainsafia Jul 31, 2023 •

edited

Loading