-
Notifications
You must be signed in to change notification settings - Fork 186
fix: Access to tracked-struct that was freed during fixpoint #817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Access to tracked-struct that was freed during fixpoint #817
Conversation
✅ Deploy Preview for salsa-rs canceled.
|
CodSpeed Performance ReportMerging #817 will not alter performanceComparing Summary
|
Unclear why this would improve performance unless it's tracing |
Formatting can have effects on codegen (that includes tracing, println etc) as that captures the address of locals. Looking at the flamegraph this perf is mainly due to inline differences here, see #818 |
db.assert_logs(expect![[r#" | ||
[ | ||
"salsa_event(DidValidateMemoizedValue { database_key: read_value(Id(401)) })", | ||
"salsa_event(DidValidateMemoizedValue { database_key: read_value(Id(403)) })", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The id
change here is because Salsa reused the IDs between iterations, even though the Output
had different values. Now, Salsa keeps all intermediate constructed Output
structs around (in the same revision).
src/function/execute.rs
Outdated
// (and are owned by the query) alive even if the query in this iteratoin no longer creates them. | ||
// The query not re-creating the tracked struct doesn't guarantee that there | ||
// aren't any other queries depending on it. | ||
if old_memo.verified_at.load() == revision_now && old_memo.may_be_provisional() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first version didn't had the check that old_memo
is from the same revision. This had the nice side-effect of allowing us to remove the provisional
from remove_stale_output
and it unlocked reusing mid-cycle query results on tracked structs.
One such example is the revalidate_with_change_after_output_read
test case where query_a
creates a new Output
in each iteration and calls read_value(db, output)
. The intermediate read_value
query results can't be reused across revisions if the check old_memo.verified_at.load() == revision_now
exists because their Output'
s only get created in later iterations of query_a
and Salsa frees all Output
s after the first iteration that haven't been created yet.
We could remove the old_memo.verified_at.load() == revision_now
restriction here but my concern is that this might lead to ever-growing output lists for intermediate queries that never get finalized. This shouldn't happen for most queries because they're likely to create the same outputs but this isn't guaranteed.
What's not entirely clear to me is that we can't remove the provisional
special handling in remove_stale_output
if we have the check here. I believe it is because we end up reading some tracked structs when validating dependend tracked struct queries. This is unfortunate. It would have been nice if we could remove that special handling all together
c82f043
to
45b6b27
Compare
Hmm, I can't reproduce the test failures locally. |
de92848
to
5e80984
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for tracking this down!! The fix looks reasonable to me, and this looks ready to land to me, once we solve out the test failures.
Hmm, I also can't reproduce the test failure on my arch x86 system :( |
src/function/execute.rs
Outdated
// The query not re-creating the tracked struct doesn't guarantee that there | ||
// aren't any other queries depending on it. | ||
if old_memo.may_be_provisional() && old_memo.verified_at.load() == revision_now { | ||
active_query.seed_outputs(old_memo.revisions.origin.outputs()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about the test failure but given that the order of inputs and outputs for dependencies is fairly important, it might be more correct to seed the outputs after having executed the query such that the seeding does not influence the executions dependency order? Maybe that is causing issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try this. I'd be surprised if it's the case because input_outputs
isn't used for ID seeding. But who knows, maybe it changes maybe_changed_after
execution order (although it shouldn't?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, but figured it might be worth a try. Seems to not have been the case though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep this regardless because I think it's nice to keep the input/output order as close as possible to the actual execution order.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
6677389
to
e2e9039
Compare
Uff, found it. The problem was that discarding the tracked struct depended on the internal hash set ordering which means that the order of tracaked structs in the free list were different. Now, lets clean up the mess that I created |
fe818f7
to
b4497d9
Compare
bbe59f5
to
c9dd58f
Compare
Fixes a bug in fixpoint iteration where salsa freed a tracked struct created in the cycle's first iteration but not in later iterations and was part of the final result of the cycle. Reading any field from that tracked struct resulted in a panic.
Tracked structs are owned by the query that created them and are freed when they aren't recreated when the query is re-executed. Fixpoint complicates things because a tracked struct might be created in the first iteration of a query but not in its second iteration. However, freeing the tracked struct from the first iteration isn't safe because it might now be used as part of the result from other queries participating in the same cycle. Therefore, the query (memo) in the final iteration must own all tracked structs that it created in previous iterations.
The way this is implemented in this PR is that salsa now copies over the outputs from previous provisional memos that were created in the same revision. There's an inline comment explaining why the logic is limited to the same revision. It would be great if we could lift that constraint but I believe it introduces a memory leak if we do.
I'm not sure what the source is for the tests failing on GitHub runners. They pass locally (x86 vs arm). I'll open this for review to get some initial feedback. I'll try to identify the cause of the failing tests if we think this is a direction worth pursuing.