Skip to content

Conversation

MichaReiser
Copy link
Contributor

@MichaReiser MichaReiser commented Apr 24, 2025

Fixes a bug in fixpoint iteration where salsa freed a tracked struct created in the cycle's first iteration but not in later iterations and was part of the final result of the cycle. Reading any field from that tracked struct resulted in a panic.

Tracked structs are owned by the query that created them and are freed when they aren't recreated when the query is re-executed. Fixpoint complicates things because a tracked struct might be created in the first iteration of a query but not in its second iteration. However, freeing the tracked struct from the first iteration isn't safe because it might now be used as part of the result from other queries participating in the same cycle. Therefore, the query (memo) in the final iteration must own all tracked structs that it created in previous iterations.

The way this is implemented in this PR is that salsa now copies over the outputs from previous provisional memos that were created in the same revision. There's an inline comment explaining why the logic is limited to the same revision. It would be great if we could lift that constraint but I believe it introduces a memory leak if we do.

I'm not sure what the source is for the tests failing on GitHub runners. They pass locally (x86 vs arm). I'll open this for review to get some initial feedback. I'll try to identify the cause of the failing tests if we think this is a direction worth pursuing.

Copy link

netlify bot commented Apr 24, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit c9dd58f
🔍 Latest deploy log https://app.netlify.com/sites/salsa-rs/deploys/680f8d85a0c62000080d5711

Copy link

codspeed-hq bot commented Apr 24, 2025

CodSpeed Performance Report

Merging #817 will not alter performance

Comparing MichaReiser:untracked-read-cycle-tracked-struct (c9dd58f) with master (0cbe7f8)

Summary

✅ 12 untouched benchmarks

@MichaReiser
Copy link
Contributor Author

Unclear why this would improve performance unless it's tracing

@Veykril
Copy link
Member

Veykril commented Apr 25, 2025

Formatting can have effects on codegen (that includes tracing, println etc) as that captures the address of locals. Looking at the flamegraph this perf is mainly due to inline differences here, see #818

db.assert_logs(expect![[r#"
[
"salsa_event(DidValidateMemoizedValue { database_key: read_value(Id(401)) })",
"salsa_event(DidValidateMemoizedValue { database_key: read_value(Id(403)) })",
Copy link
Contributor Author

@MichaReiser MichaReiser Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The id change here is because Salsa reused the IDs between iterations, even though the Output had different values. Now, Salsa keeps all intermediate constructed Output structs around (in the same revision).

// (and are owned by the query) alive even if the query in this iteratoin no longer creates them.
// The query not re-creating the tracked struct doesn't guarantee that there
// aren't any other queries depending on it.
if old_memo.verified_at.load() == revision_now && old_memo.may_be_provisional() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first version didn't had the check that old_memo is from the same revision. This had the nice side-effect of allowing us to remove the provisional from remove_stale_output and it unlocked reusing mid-cycle query results on tracked structs.

One such example is the revalidate_with_change_after_output_read test case where query_a creates a new Output in each iteration and calls read_value(db, output). The intermediate read_value query results can't be reused across revisions if the check old_memo.verified_at.load() == revision_now exists because their Output's only get created in later iterations of query_a and Salsa frees all Outputs after the first iteration that haven't been created yet.

We could remove the old_memo.verified_at.load() == revision_now restriction here but my concern is that this might lead to ever-growing output lists for intermediate queries that never get finalized. This shouldn't happen for most queries because they're likely to create the same outputs but this isn't guaranteed.

What's not entirely clear to me is that we can't remove the provisional special handling in remove_stale_output if we have the check here. I believe it is because we end up reading some tracked structs when validating dependend tracked struct queries. This is unfortunate. It would have been nice if we could remove that special handling all together

@MichaReiser MichaReiser changed the title Access to tracked-struct that was freed during fixpoint fix: Access to tracked-struct that was freed during fixpoint Apr 25, 2025
@MichaReiser MichaReiser added the bug Something isn't working label Apr 25, 2025
@MichaReiser MichaReiser force-pushed the untracked-read-cycle-tracked-struct branch 2 times, most recently from c82f043 to 45b6b27 Compare April 25, 2025 09:30
@MichaReiser
Copy link
Contributor Author

Hmm, I can't reproduce the test failures locally.

@MichaReiser MichaReiser force-pushed the untracked-read-cycle-tracked-struct branch 4 times, most recently from de92848 to 5e80984 Compare April 25, 2025 10:12
@MichaReiser MichaReiser requested a review from carljm April 25, 2025 10:28
@MichaReiser MichaReiser marked this pull request as ready for review April 25, 2025 10:28
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tracking this down!! The fix looks reasonable to me, and this looks ready to land to me, once we solve out the test failures.

@MichaReiser
Copy link
Contributor Author

MichaReiser commented Apr 28, 2025

Hmm, I also can't reproduce the test failure on my arch x86 system :(

// The query not re-creating the tracked struct doesn't guarantee that there
// aren't any other queries depending on it.
if old_memo.may_be_provisional() && old_memo.verified_at.load() == revision_now {
active_query.seed_outputs(old_memo.revisions.origin.outputs());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the test failure but given that the order of inputs and outputs for dependencies is fairly important, it might be more correct to seed the outputs after having executed the query such that the seeding does not influence the executions dependency order? Maybe that is causing issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try this. I'd be surprised if it's the case because input_outputs isn't used for ID seeding. But who knows, maybe it changes maybe_changed_after execution order (although it shouldn't?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, but figured it might be worth a try. Seems to not have been the case though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep this regardless because I think it's nice to keep the input/output order as close as possible to the actual execution order.

@MichaReiser

This comment was marked as resolved.

@MichaReiser

This comment was marked as resolved.

@MichaReiser MichaReiser force-pushed the untracked-read-cycle-tracked-struct branch from 6677389 to e2e9039 Compare April 28, 2025 13:03
@MichaReiser
Copy link
Contributor Author

MichaReiser commented Apr 28, 2025

Uff, found it. The problem was that discarding the tracked struct depended on the internal hash set ordering which means that the order of tracaked structs in the free list were different.

Now, lets clean up the mess that I created

@MichaReiser MichaReiser force-pushed the untracked-read-cycle-tracked-struct branch from fe818f7 to b4497d9 Compare April 28, 2025 14:10
@MichaReiser MichaReiser force-pushed the untracked-read-cycle-tracked-struct branch from bbe59f5 to c9dd58f Compare April 28, 2025 14:15
@MichaReiser MichaReiser enabled auto-merge April 28, 2025 14:20
@MichaReiser MichaReiser added this pull request to the merge queue Apr 28, 2025
Merged via the queue into salsa-rs:master with commit b27e392 Apr 28, 2025
11 checks passed
@MichaReiser MichaReiser deleted the untracked-read-cycle-tracked-struct branch April 28, 2025 14:34
@github-actions github-actions bot mentioned this pull request Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants