Improve performance for schemas with many fields #140
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I did some performance testing and noticed that there is some noticeable overhead once the schema grows.
Commit ccd7f68 introduces a
invariantf
function that formats the error message with fmt.Sprintf. This was done because in CPU profiles Printf regularily came up almost top.I traced the source of the Printf's to defineFieldMap and started replacing there, but later extended to all occurances that I found. The change almost halved query processing time for a schema with 20 fields and also brought down allocations from ~2800 to ~1800.
In fdb793c I tried to eliminate the next biggest source of CPU usage, which was the regex in
assertValidName
. Suprisingly this only resulted in a ~26% decrease in query execution time.58481fc only tries to avoid some slice reallocations in some places where the final slice length is known. This didn't really increase performance much (~1%) but seeing that the garbage collector is already the biggest source of CPU usage I left it in anyways.
I did some benchmarks comparing the changes to master with varying schema sizes.
schema with 1 field:
schema with 20 fields:
schema with 100 fields:
The source for the benchmarks can be found at: https://gist.github.com/sfriedel/81793288bcc373988f6a000062d051ea
I did some testing with the changes and both the tests and my application run flawlessly with them, but still it would be good if someone else takes a look.