Skip to content

Count conflicts to speed up solver #3513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 4, 2016
Merged

Conversation

kosmikus
Copy link
Contributor

@kosmikus kosmikus commented Jun 30, 2016

This change introduces a new flag --count-conflicts (enabled by default, can be disabled by saying --no-count-conflicts).

The idea is simple: while solving, we maintain a table that counts how often each goal (roughly: each package) has been involved in a conflict; whenever we make choices about what goal to solve next, we consult the table and prefer goals that have been involved in conflicts most often. Why is that good? Because it means the solver will try hard to spend time solving "difficult goals" that are most likely to cut off large amounts of the search space.

All tests for this have been immensely promising. Unlike the already existing --reorder-goals, using --count-conflicts has hardly any overhead. While there are some packages where --count-conflicts helps more and some where it helps less, it barely ever slows down solving. It barely ever affects the install plan being found (although in theory, it can, because changing the goal order can always lead to other install plans). It quite often leads to dramatic speedups though.

The universe package mentioned by @phadej in #3453 is one such example, were solver time goes from 160 to 2 seconds.

All in all, I'm confident enough about this to immediately make it the default.

The original problem in #3453 is unfortunately not solved by this, but I believe --count-conflicts is still an important step to solving that scenario too. The remaining problem is that some things interact badly with --count-conflicts:

  • Some of the heuristics in the solver also interact with the goal search order. In particular, there is a heuristic that "defers" solving certain flags, which means they're removed from the goal set. Once they're removed, they can no longer be preferred, even if they're at the core of a complex conflict. The solution is to make deferral less absolute, which is a next step.
  • Using --reorder-goals together with --count-conflicts is not recommended. Both strategies try to have the final word on the goal order, and they don't combine well. So one should use one or the other. I haven't seen many (any?) cases so far where --reorder-goals was better than --count-conflicts.

@kosmikus
Copy link
Contributor Author

/cc @grayjay @edsko @dcoutts @23Skidoo

@phadej
Copy link
Collaborator

phadej commented Jun 30, 2016

Seems that tests need updating.

@kosmikus kosmikus force-pushed the count-conflicts-2 branch from 1dfa8ba to 0c12435 Compare July 2, 2016 13:53
@kosmikus
Copy link
Contributor Author

kosmikus commented Jul 2, 2016

Rebased.

@grayjay
Copy link
Collaborator

grayjay commented Jul 3, 2016

I think this is a great improvement, and I agree that it should be the default.

I was wondering if there might be a better way to handle the ConflictMap state. I spent some time trying to understand the performance implications of adding a state monad to tree traversal, because I made a similar change in #2917. It seems like a space leak can easily develop when there is a pair holding a large data structure (solver log) and the state calculated from traversing that data structure. (The program can retain the pair and the whole data structure when it references the final state.)

I experimented with instead putting the state at the end of the log: grayjay@e7b2c3c. I think that the space behavior is more predictable, but it's not as clean. The state monad may be the best solution. I just wanted to put another idea out there.

I also looked at the change in the output of the solver unit tests, to check for tests that I missed in #3510. The only test that showed a change in goal order was indepGoals5. I could make a PR to freeze the goal order for that test, as well as indepGoals6, which looks similar.

Test log diff:

-  UnitTests.Distribution.Solver.Modular.Solver
-    Independent goals
 targets: X, Y
 constraints: 
-      indepGoals5:   stanzas X test (unknown source)
+  UnitTests.Distribution.Solver.Modular.Solver
+  stanzas X test (unknown source)
   stanzas Y test (unknown source)
   stanzas A test (unknown source)
+    Independent goals
   stanzas A test (unknown source)
   stanzas B test (unknown source)
   stanzas C test (unknown source)
-  stanzas C test (unknown source)
+      indepGoals5:   stanzas C test (unknown source)
 preferences: 
 strategy: PreferLatestForSelected
 reorder goals: ReorderGoals False
+count conflicts: CountConflicts True
 independent goals: IndependentGoals True
 avoid reinstalls: AvoidReinstalls False
 shadow packages: ShadowPkgs False
 strong flags: StrongFlags False
 max backjumps: infinite
 [__0] trying: 0.X-1.0.0 (user goal)
 [__1] trying: 0.A-2.0.0 (dependency of 0.X-1.0.0)
 [__2] trying: 0.B-1.0.0 (dependency of 0.A-2.0.0)
 [__3] trying: 0.C-2.0.0 (dependency of 0.X-1.0.0)
 [__4] trying: 1.Y-1.0.0 (user goal)
 [__5] trying: 1.A~>0.A-2.0.0 (dependency of 1.Y-1.0.0)
 [__6] trying: 1.B~>0.B-1.0.0 (dependency of 1.A-2.0.0)
 [__7] next goal: 1.C (dependency of 1.Y-1.0.0)
 [__7] rejecting: 1.C~>0.C-2.0.0 (conflict: 1.Y => 1.C==1.0.0)
 [__7] rejecting: 1.C-2.0.0 (multiple instances)
 [__7] rejecting: 1.C-1.0.0 (dependencies not linked: cannot make 1.C canonical member of {*0.C-2.0.0,1.C-2.0.0}; conflict set: 0.C, 1.B, 1.C)
 [__7] fail (backjumping, conflict set: 0.C, 1.B, 1.C, 1.Y)
 [__6] rejecting: 1.B-1.0.0 (multiple instances)
 [__6] fail (backjumping, conflict set: 0.B, 0.C, 1.A, 1.B, 1.C, 1.Y)
 [__5] rejecting: 1.A-2.0.0 (multiple instances)
 [__5] rejecting: 1.A-1.0.0 (conflict: 1.Y => 1.A==2.0.0)
 [__4] fail (backjumping, conflict set: 0.A, 0.B, 0.C, 1.A, 1.B, 1.C, 1.Y)
 [__3] rejecting: 0.C-1.0.0 (conflict: 0.X => 0.C==2.0.0)
 [__2] fail (backjumping, conflict set: 0.A, 0.B, 0.C, 0.X, 1.A, 1.B, 1.C, 1.Y)
 [__1] trying: 0.A-1.0.0
 [__2] trying: 0.C-2.0.0 (dependency of 0.X-1.0.0)
 [__3] trying: 1.Y-1.0.0 (user goal)
-[__4] next goal: 1.A (dependency of 1.Y-1.0.0)
-[__4] rejecting: 1.A~>0.A-1.0.0 (conflict: 1.Y => 1.A==2.0.0)
-[__4] trying: 1.A-2.0.0
-[__5] trying: 1.B-1.0.0 (dependency of 1.A-2.0.0)
-[__6] next goal: 1.C (dependency of 1.Y-1.0.0)
-[__6] rejecting: 1.C~>0.C-2.0.0 (conflict: 1.Y => 1.C==1.0.0)
-[__6] rejecting: 1.C-2.0.0 (multiple instances)
-[__6] trying: 1.C-1.0.0
+[__4] next goal: 1.C (dependency of 1.Y-1.0.0)
+[__4] rejecting: 1.C~>0.C-2.0.0 (conflict: 1.Y => 1.C==1.0.0)
+[__4] rejecting: 1.C-2.0.0 (multiple instances)
+[__4] trying: 1.C-1.0.0
+[__5] next goal: 1.A (dependency of 1.Y-1.0.0)
+[__5] rejecting: 1.A~>0.A-1.0.0 (conflict: 1.Y => 1.A==2.0.0)
+[__5] trying: 1.A-2.0.0
+[__6] next goal: 1.B (dependency of 1.A-2.0.0)
+[__6] trying: 1.B-1.0.0
 [__7] done
 OK

I'll make some comments on the code.

@@ -697,6 +701,7 @@ defaultFreezeFlags = FreezeFlags {
freezeSolver = Flag defaultSolver,
freezeMaxBackjumps = Flag defaultMaxBackjumps,
freezeReorderGoals = Flag (ReorderGoals False),
freezeCountConflicts = Flag (CountConflicts False),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't fetch and freeze use the same default solver parameters as install?

@grayjay
Copy link
Collaborator

grayjay commented Jul 4, 2016

@kosmikus Thanks for adding the comments.

I made a PR to handle the unit test that was affected by this change: #3517

@edsko
Copy link
Contributor

edsko commented Jul 4, 2016

I haven't followed the details of this thread, but FWIW:

I was wondering if there might be a better way to handle the ConflictMap state. I spent some time trying to understand the performance implications of adding a state monad to tree traversal, because I made a similar change in #2917. It seems like a space leak can easily develop when there is a pair holding a large data structure (solver log) and the state calculated from traversing that data structure. (The program can retain the pair and the whole data structure when it references the final state.)

I experimented with instead putting the state at the end of the log: grayjay/cabal@e7b2c3c. I think that the space behavior is more predictable, but it's not as clean. The state monad may be the best solution. I just wanted to put another idea out there.

As it happens I have experimented a lot with this sort of thing recently; have a half written blog post and a lot of experimental data that I have no time to put into some kind of coherent state :/ Anyway, in my opinion, putting the state at the end of the log both the most predictable and in fact the cleanest solution. It is very very easy to make mistakes and have memory blow up when using a monadic approach with this sort of thing.

@kosmikus
Copy link
Contributor Author

kosmikus commented Jul 4, 2016

@grayjay Had another look at your way to treat the conflict map, and I agree with @edsko that it's actually better to do it this way. So I cherry-picked your change and added it to this PR.

@grayjay
Copy link
Collaborator

grayjay commented Jul 4, 2016

LGTM

@kosmikus kosmikus merged commit e08d602 into haskell:master Jul 4, 2016
@kosmikus
Copy link
Contributor Author

kosmikus commented Jul 4, 2016

Ok. Merged.

@23Skidoo This one's probably tricky to merge into 1.24 because of the solver module reorganisation. On the other hand, it is a significant speedup for many situations, so depending on how far off 1.26 is, I might be willing to do the work if you'd be willing to put it in 1.24 as well.

@23Skidoo
Copy link
Member

23Skidoo commented Jul 5, 2016

@kosmikus I want to release 1.26 before the end of the year, likely in November/December.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants