Skip to content

Fix construction of conflict sets once and for all (WIP) #3357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Apr 23, 2016
Merged

Fix construction of conflict sets once and for all (WIP) #3357

merged 40 commits into from
Apr 23, 2016

Conversation

edsko
Copy link
Contributor

@edsko edsko commented Apr 17, 2016

At @kosmikus ' request I have started a new solver branch in the main repository so that we can push solver fixes to this branch and then once we are happy with them, push them to master.

I realized that my understanding of conflict sets and goal reason chains was flawed. I think it's not just me though; they badly need explaining and documenting. To this end I started writing a blog post about the relation between conflict sets, pre assignments, constrained instances, and goal reason chains; a first draft can be found at http://www.well-typed.com/blog/preview/backjumping/ .

On this branch so far we first have @grayjay 's test case for backjumping after the SIR check. We previously fixed this in #3221, but I now realize the fix there was incorrect. (In my defence, I did mention I was not very convinced.) I hope it is correct now (and if so, this PR will obsolete #3221). In addition we have a fix to the cycle check detection (#3170). That check too was wrong, and for similar reasons. The correct solution is also similar in both cases. Instead of collecting which packages are involved in the conflict, we instead collect Goals (packages + goal reason chains). Then to construct the conflict set we construct conflict sets for each of these individual goals and then merge them. This is just a generalization of the process that happens in during validation (described in the aforelinked blog post). For the cycle check this is a tad awkward since we start with the final reverse dependency map which does not include goal reason chains, but we can just record these separately.

Sadly I am running out of time now. I was hoping to get more done today but I cannot. However, I think what needs to happen in the linker is similar (#3327) : instead of trying to construct a separate lgBlame set, we instead should collect Goals as members of the link group, and use those to construct the conflict set. separate lgBlame is not necessary (I hope); though perhaps @grayjay will correct me on that. Note that #3268 is (mostly) independent from this, as that one is less about constructing conflict set as it is about constructing the right link groups in the first place. That said, that PR should probably also be rebased to this branch (or at least, my version of it at https://github.com/edsko/cabal/tree/pr/FixLinkDeps ; still on my TODO list).

It is somewhat frustrating that this is so difficult, but this is a subtle topic and I think fixing these issues and documenting and writing this up is rather important.

@grayjay can I ask you for a favour? Can you rebase all your evil solver unit tests that are still missing from master and add them to this solver branch, even if they are currently failing? That would be very helpful.

edsko and others added 4 commits April 17, 2016 10:54
This commit is a pure refactoring, no semantic changes. Submitting as separate
PR at @kosmikus request.

Tihs moves `ConflictSet` and `Var` into their own modules.
In the original implemented we recorded only the two conflicting packages; in
the "fix" in #3221 we include the GRC for one of the two packages, but not the
other. This is also wrong, although slightly less so and @kosmikus tells me
that it's actually rather difficult to find an example that exposes the
remaining inaccuracy. I'll have to take him at his word on that :) In this
commit we use the GRC for both goals instead.
The cycle check was not constructing the correct conflict sets; when we found a
cycle, we constructed a conflict set consisting of all choices that we made so
far, as long as those choices directly involved a package in the cycle. But
sometimes the dependencies are more subtle than that. Now instead we construct
conflict sets from all packages _and their goal reason chains_ that were
involved in the cycle and then union all those.
@grayjay
Copy link
Collaborator

grayjay commented Apr 18, 2016

Will this branch be merged into 1.24, if there's enough time?

@edsko I'll cherry-pick my tests and your comments on the tests. I can also rebase pr/FixLinkDeps.

Instead of collecting which packages are involved in the conflict, we instead collect Goals (packages + goal reason chains).

Does this change apply to all conflict sets? Will the new ConflictSet type only allow insertion of Goals?

separate lgBlame is not necessary (I hope); though perhaps @grayjay will correct me on that.

Here's the reason that I thought we needed lgBlame. I think it is possible for packages to become linked for reasons other than the actual linking choice or the introduction of the packages' goals. lgBlame stores those other variables so that they can be added to the conflict set if the link group ever leads to a failure.

Say A depends on B and C, and introduces both goals. B also depends on C, but only when flagB is true. For some dependency problem, the solver has to choose 0.B and 1.B. If it links 0.B and 1.B and tries +flagB, it will have to link 0.C and 1.C. lgBlame for 0.C and 1.C's link group will contain 0.flagB. If linking 1.C leads to a failure, the solver should try -flagB. Without lgBlame, the conflict set would only include 0.C, 1.C, 0.A, and 1.A (as well as any packages that introduced 0.A or 1.A).

@edsko
Copy link
Contributor Author

edsko commented Apr 18, 2016

Will this branch be merged into 1.24, if there's enough time?

I really hope so.

@edsko I'll cherry-pick my tests and your comments on the tests. I can also rebase pr/FixLinkDeps.

If you can rebase pr/FixLinkDeps and push it to this branch, that would be awesome. I am incredibly strapped for time at the moment.

Instead of collecting which packages are involved in the conflict, we instead collect Goals (packages + goal reason chains).

Does this change apply to all conflict sets? Will the new ConflictSet type only allow insertion of Goals?

I think it would make sense to say that ConflictSets can only be constructed from two operations: from a Goal, or by merging other ConflictSets. If that is not the case, then the exceptions would be quite illustrative. But perhaps @kosmikus can comment here as well.

separate lgBlame is not necessary (I hope); though perhaps @grayjay will correct me on that.

Here's the reason that I thought we needed lgBlame. I think it is possible for packages to become linked for reasons other than the actual linking choice or the introduction of the packages' goals. lgBlame stores those other variables so that they can be added to the conflict set if the link group ever leads to a failure.

Say A depends on B and C, and introduces both goals. B also depends on C, but only when flagB is true. For some dependency problem, the solver has to choose 0.B and 1.B. If it links 0.B and 1.B and tries +flagB, it will have to link 0.C and 1.C. lgBlame for 0.C and 1.C's link group will contain 0.flagB. If linking 1.C leads to a failure, the solver should try -flagB. Without lgBlame, the conflict set would only include 0.C, 1.C, 0.A, and 1.A (as well as any packages that introduced 0.A or 1.A).

Yes, it is important that flagB will be included in the conflict set here. However, in your example, the dependency on C will have a goal reason chain that includes flagB; so, we if base this on Goals (packages + goal reason chains), I think this will be okay? I could be wrong though.

@kosmikus
Copy link
Contributor

Instead of collecting which packages are involved in the conflict, we instead collect Goals (packages + goal reason chains).

Does this change apply to all conflict sets? Will the new ConflictSet type only allow insertion of Goals?

I think it would make sense to say that ConflictSets can only be constructed from two operations: > from a Goal, or by merging other ConflictSets. If that is not the case, then the exceptions would be quite illustrative. But perhaps @kosmikus can comment here as well.

Yes, I think that makes sense. If we need other ways to create ConflictSets for whatever reason, it still makes sense to explicitly add them to the ConflictSet module now that the type is abstract.

separate lgBlame is not necessary (I hope); though perhaps @grayjay will correct me on that.

Here's the reason that I thought we needed lgBlame. I think it is possible for packages to become linked for reasons other than the actual linking choice or the introduction of the packages' goals. lgBlame stores those other variables so that they can be added to the conflict set if the link group ever leads to a failure.

Say A depends on B and C, and introduces both goals. B also depends on C, but only when flagB is true. For some dependency problem, the solver has to choose 0.B and 1.B. If it links 0.B and 1.B and tries +flagB, it will have to link 0.C and 1.C. lgBlame for 0.C and 1.C's link group will contain 0.flagB. If linking 1.C leads to a failure, the solver should try -flagB. Without lgBlame, the conflict set would only include 0.C, 1.C, 0.A, and 1.A (as well as any packages that introduced 0.A or 1.A).

Yes, it is important that flagB will be included in the conflict set here. However, in your example, the dependency on C will have a goal reason chain that includes flagB; so, we if base this on Goals (packages + goal reason chains), I think this will be okay? I could be wrong though.

Hmm, I'm not sure here, because I'm still not completely confident about some aspects of linking. Normally, if A already depends on C, then it's perfectly possible for C to get GRC A only. If it's later discovered that B also depends on C under certain circumstances, then that GRC would not be updated. In the unlinked scenario, this is ok. If we figure out that there's a conflict involving C, then we do not have a chance to avoid the C goal if we not at least change the choice for A.

With linking, the situation is a bit more complicated. If we link 0.B and 1.B (but not 0.A and 1.A if they both exist), then as @grayjay says, it depends on the choice of flagB if we a forced to link 0.C and 1.C or not. Once again, without looking at the code, the intuition here is the constraint that 0.C and 1.C have to be linked is introduced by the choice of flagB, so written as a constraint, it'd be something like

1.B:flagB => 0.C ~ 1.C

So indeed, it's clear that if a conflict is arising in the context of (not) linking 1.C to 0.C, then 1.B:flagB should definitely be a goal in the conflict set. The question is how we get there. For normal goals, we just have a set of active constraints and if we find a conflict between two constraints, we create a conflict set from the two goals that introduced these constraints. I'm not quite sure how this translates to link groups.

@kosmikus
Copy link
Contributor

@edsko Everything that is in this PR up until now looks ok to me.

@23Skidoo
Copy link
Member

@edsko

Will this branch be merged into 1.24, if there's enough time?

I really hope so.

Since these patches only touch cabal-install code, they can always go into a patch-level release of cabal-install.

@kosmikus
Copy link
Contributor

@23Skidoo That's true and good to know, but we are in the somewhat uncomfortable situation that we have merged certain new features (such as in particular the setup/component dependencies) that are currently quite fragile due to subtle bugs. Conflict-set related bugs mean that the solver is incomplete, i.e., it does not find solutions where there are any, and can be very difficult to work around.

On the one hand, it's fortunate that this only affects new features that aren't in wide use yet. On the other hand, it'd be nice not to spoil the experience that users have with these new features by shipping them in a state that is known to be broken.

I'm sorry that I have started looking at these so late. But unfortunately, the issues involved are quite subtle, and solving them properly requires a lot of thinking time. I think that after last week's discussions, @edsko and I are on a good track now, and I'm optimistic that with the help of @grayjay, the three of us can come up with a robust solution soon.

@23Skidoo
Copy link
Member

@kosmikus @hvr told me that there will be a fourth RC before the GHC 8 release, so we probably can (yet again) postpone the 1.24 release for a short while.

@kosmikus
Copy link
Contributor

@23Skidoo :) Ok, I'll try to keep the momentum up this time.

@hvr
Copy link
Member

hvr commented Apr 18, 2016

@23Skidoo Fwiw, RC4 is officially mentioned in the RC3 announcement: https://mail.haskell.org/pipermail/ghc-devs/2016-April/011862.html :-)

@kosmikus
Copy link
Contributor

@grayjay Ok, thanks. @edsko, in case you agree, just hit the button :)

@23Skidoo 23Skidoo assigned edsko and unassigned kosmikus Apr 22, 2016
@23Skidoo
Copy link
Member

23Skidoo commented Apr 22, 2016

@kosmikus Will there be any other solver fixes to merge into 1.24?

@kosmikus
Copy link
Contributor

kosmikus commented Apr 22, 2016

@23Skidoo If I'm keeping track correctly, then once this PR is merged, the following PRs can be closed because they have been integrated into / made obsolete by this one:

@edsko, @grayjay: Can you verify this list? Am I missing any PRs? Am I listing PRs that you think aren't fixed by this?

The only other possible candidate for 1.24 that I'm currently aware of and that isn't covered by this PR is #3290. If @edsko agrees that this one (#3357) is ready to merge, I'm planning to talk to him about what he thinks about #3290. I don't fully understand it yet (there are a number of changes outside of the solver, in code I'm less familiar with), and @ezyang seems to claim that it's still buggy ...

@ezyang
Copy link
Contributor

ezyang commented Apr 22, 2016

I restarted the build to see if it's a transient failure. But looking at the logs, it just seems like Travis got slower recently.

@grayjay
Copy link
Collaborator

grayjay commented Apr 23, 2016

@kosmikus The list of PRs looks correct. I can close mine.

I looked at http://scribu.github.io/travis-stats/#haskell/cabal, and one thing that has increased build times on master is the solver quickcheck tests, since the solver needs to be compiled again. I guess that's another reason to split the solver into a library. Another option is to add the tests to unit-tests. The only problem is that the solver tests can take minutes to find bugs. The default number of tests doesn't take long to run, it just isn't very thorough.

@edsko
Copy link
Contributor Author

edsko commented Apr 23, 2016

Merged with master 😁 Let me just for the record extend a huge kudos once more to @grayjay -- you're a star!

@23Skidoo I have closed all the outstanding PRs that were subsumed by this branch. As @kosmikus says, the only one remaining is #3232/#3290, and I'm inclined to propose that we do not try to merge that before release. However, Andres mentioned that @hvr or @dcoutts said that that PR is actually needed for the nix-local-build stuff to be useable. I don't fully understand why that would the case, but if true, then it's a higher priority than I thought.

@grayjay
Copy link
Collaborator

grayjay commented Apr 23, 2016

@edsko Thanks!

@kosmikus
Copy link
Contributor

Thanks to both @grayjay and @edsko for making this happen. Sorry again for letting the original PRs linger for so long, and thanks for nevertheless investing so much time now to make this happen pre-1.24 after all.

Regarding #3290: I think I had misunderstood which bug was responsible. Had a discussion with @dcoutts yesterday, and we verified that the main problem he was thinking about was a conflict set problem that is fixed by the solver branch. I would nevertheless still at least try to understand the current status of #3290 and the trade-offs involved before I make a final decision on whether to try to merge it pre-1.24 or not.

@23Skidoo 23Skidoo deleted the solver branch April 23, 2016 14:15
@23Skidoo
Copy link
Member

@kosmikus @edsko @grayjay Thanks, everyone!

I'll wait for a verdict on #3290 from @kosmikus before releasing 1.24 final.

@23Skidoo
Copy link
Member

@kosmikus

I've now backported the solver fixes from master to 1.24 (see #3373).

@grayjay
Copy link
Collaborator

grayjay commented Apr 23, 2016

@kosmikus @23Skidoo Thanks!

@kosmikus
Copy link
Contributor

@23Skidoo I unfortunately haven't managed to talk to @edsko over the weekend or today. But having looked a bit at #3290 myself, I think I agree with @edsko that we should do that post-1.24. It all still looks somewhat confusing to me, and while it's without doubt nice to have, it doesn't seem to be a big blocker either.

@23Skidoo
Copy link
Member

@kosmikus OK. Thanks for looking into this.

@edsko
Copy link
Contributor Author

edsko commented May 7, 2016

@grayjay I've emailed you with a link to a draft blog post about backjumping and conflict sets; I used the email address listed on your github profile. Let me know if this isn't the right address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants