-
Notifications
You must be signed in to change notification settings - Fork 952
map very slow with larger composite keys #3354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Realized it may not be specific to maps, with adding some debug logging I found that the key is allocated on the heap each iteration. The intent for this sort of cache is to reduce allocations and it's important for the key to be on the stack. The size of the struct is 20 bytes, not tiny but still would expect it to stick to the stack. Any pointer on where TinyGo decides to allocate a struct on the stack or heap. Or could it be that it always allocates them on the heap? |
For reference, these seem to be the allocation sizes for a single
|
Ah found this, it probably is map specific https://github.com/tinygo-org/tinygo/blob/release/compiler/map.go#L99 Conversion to interface would allocate. Has there ever been any thought put into generating code to compute the hashcode for a struct inline instead of converting to interface? Do we not have runtime information available for a struct like Go does in |
BTW JFYI I tried switching the string field to a pointer field (of course this is only a valid change in certain circumstances), and could confirm all allocations are gone, the ones shown before must indeed been of converting the key to an interface, and allocating some other helper objects during reflection presumably. Then Wasm is about 5-10x slower than native, while I wish Wasm is faster this looks much closer than what I'd expect and is workable :) Hoping there is a path towards supporting |
The map type has seen very little optimization effort, as you can see. In this case, it seems like the interface should have been stack allocated (by the heap-to-stack optimization) but apparently it wasn't.
Not necessarily, but it may be hard for the compiler to prove it is safe to allocate on the stack. Have you tried |
Ah thanks hadn't tried that flag before, looks pretty useful. Unfortunately I guess it's not able to introspect the code generated by the compiler, e.g.
|
Yeah Is this with the fix to avoid strings in the struct, that you mentioned? Because in that case you won't see the allocation - you just avoided it. If you undo it, you should be able to see the allocation again (I'm guessing somewhere in your main package). |
Oops duh thanks for noticing that. Not sure but looks like it's saying
|
Taking ownership of |
The escape logic is being confused by the function pointers in the |
Hmm.. can we even apply |
Yes, I think so. I don't see where reflection would keep the pointer alive after return (it certainly shouldn't). |
Oh right, I was thinking |
I think the fix here is to tag more of these runtime hashmap methods as For example, in
the compiler doesn't know that I just did this for |
@dgryski yes sounds reasonable! |
When integrating a cache in an attempt to improve performance by reducing allocations, I found it to severely reduce performance, with map operations adding milliseconds of overhead. I tried making a simple benchmark to simulate similar behavior (notably, a large-ish composite key). It's hard to isolate processing vs FFI but I added a 100-iteration loop within wasm to try to help. I had originally hoped to make it larger, but it was too slow to complete in meaningful time.
https://github.com/corazawaf/coraza-proxy-wasm/compare/main...anuraaga:coraza-1225?expand=1#diff-173fbfd8d8844658344b121461b4290d0a85230caae9825240705df8130e8b75
There are of course many variables here, such as performance of the wasm runtime (wazero), though I noticed slow map performance in a real world usage with Envoy (v8) too. Even assuming wasm is slower than native code, this seems like many more orders of magnitude than I would expect in general.
Is this known performance behavior? I saw in #1553 that growing logic was added so hashmaps shouldn't end up with quadratic performance without an appropriate size hint, but in this benchmark the number of keys is about 2400 vs a size hint of 10000 so growing or load factor shouldn't be much of an issue either way.
I tried setting
runtime_memhash_tsip
but no real change.For reference, the real-world code the benchmark attemps to reflect
corazawaf/coraza#537
The text was updated successfully, but these errors were encountered: