cabal goes crazy on large Build-Depends lists #2548

amigalemming · 2015-04-22T06:38:10Z

acme-everything is a package that depends on every other package on Hackage.

$ cabal fetch --no-dependencies acme-everything-2015.4.15.1
Downloading acme-everything-2015.4.15.1...
$ cabal unpack acme-everything-2015.4.15.1
Unpacking to acme-everything-2015.4.15.1/
^CAborted

I have to abort unpacking because it eats all my memory.
I first encountered this problem with my Hackage->BibTeX program that converts all Cabal files from index.tar to BibTeX entries. This program only parses the Cabal files. That is, I suspect that the huge Build-Depends list already asks too much of the parser.

The text was updated successfully, but these errors were encountered:

ttuegel · 2015-04-23T11:27:29Z

@amigalemming Could you produce a profiling report, as described here? Because you're using Cabal as a library, you will need to build you own program with profiling as well.

amigalemming · 2015-04-23T12:39:27Z

On Thu, 23 Apr 2015, Thomas Tuegel wrote:

@amigalemming Could you produce a profiling report, as described here?
Because you're using Cabal as a library, you will need to build you own
program with profiling as well.

Can you already reproduce and confirm the problem with the cabal-install
command line I have reported?

ttuegel · 2015-04-23T22:30:37Z

Here is the profiling report I produced:

    Thu Apr 23 16:04 2015 Time and Allocation Profiling Report  (Final)

       cabal +RTS -s -p -RTS install acme-everything --dry-run

    total time  =        3.42 secs   (3418 ticks @ 1000 us, 1 processor)
    total alloc = 3,229,722,296 bytes  (excludes profiling overheads)

COST CENTRE                                  MODULE                                       %time %alloc

packageIndexFromCache.readPackageDescription Distribution.Client.IndexUtils                70.2   81.7
MAIN                                         MAIN                                           8.5    0.0
readIndexCacheEntry.\                        Distribution.Client.IndexUtils                 3.2    2.7
fromList                                     Distribution.Client.PackageIndex               2.9    1.7
groupMap                                     Distribution.Client.Dependency.Modular.Index   2.8    1.6
getInstalledPackages                         Distribution.Client.IndexUtils                 2.0    3.0
fromList.fixBucket                           Distribution.Client.PackageIndex               1.4    0.6
packageIndexFromCache.accum                  Distribution.Client.IndexUtils                 1.3    1.7

(I have truncated it to fit GitHub's comment length restrictions, but nothing interesting was omitted.) As we expected, readPackageDescription is the primary offender. As reported in #2276 and #1127, the performance of all of Cabal's parsers is abysmal due to the use of ReadP.

@dcoutts What is the status of the improved parser? I noticed that ghc-7.10.1 no longer depends on the Cabal library; does this mean we need only wait 3 years to depend on parsec? ;)

23Skidoo · 2015-04-23T23:18:39Z

does this mean we need only wait 3 years to depend on parsec?

No, why? Old GHCs won't suddenly start depending on Cabal HEAD. We should be free to start using Parsec right now.

amigalemming · 2015-04-24T04:45:45Z

On Thu, 23 Apr 2015, Thomas Tuegel wrote:

Here is the profiling report I produced:

Thu Apr 23 16:04 2015 Time and Allocation Profiling Report  (Final)

   cabal +RTS -s -p -RTS install acme-everything --dry-run

total time  =        3.42 secs   (3418 ticks @ 1000 us, 1 processor)
total alloc = 3,229,722,296 bytes  (excludes profiling overheads)

This means your call completes in 3.42 seconds? Here it does not complete
in reasonable time, but quickly begins thrashing. I am using:

$ cabal --version
cabal-install version 1.22.2.0
using version 1.22.2.0 of the Cabal library

Are you sure you install acme-everything-2015.4.15.1 and not 2015.4.15,
which depends on much less packages?

ttuegel · 2015-04-24T11:20:58Z

This means your call completes in 3.42 seconds? Here it does not complete in reasonable time, but quickly begins thrashing. I am using:

It does not complete here. After about 3 seconds CPU time I have to kill it because the OS is paging out everything else. My terminal emulator is usually the first thing paged out, for some reason, and as a result, the whole process takes about 10 minutes wall time.

amigalemming · 2015-04-24T13:46:31Z

On Fri, 24 Apr 2015, Thomas Tuegel wrote:

  This means your call completes in 3.42 seconds? Here it does not complete in reasonable time, but
  quickly begins thrashing. I am using:
It does not complete here. After about 3 seconds CPU time I have to kill it because the OS is paging out
everything else. My terminal emulator is usually the first thing paged out, for some reason, and as a result,
the whole process takes about 10 minutes wall time.

That is, you can reproduce the problem. Do you still need a profiled run
from my side?

ttuegel · 2015-04-24T13:48:36Z

That is, you can reproduce the problem. Do you still need a profiled run from my side?

No, it seems like the problem is where we thought it would be. There's not much to do, besides wait for the new parser. Thanks!

ttuegel · 2015-04-24T17:05:47Z

No, why? Old GHCs won't suddenly start depending on Cabal HEAD. We should be free to start using Parsec right now.

Sorry, I was being a little sarcastic. Although, we have to package our own version of binary because older GHC < 7.8 has a version that is too old. Why is parsec different?

23Skidoo · 2015-04-24T17:10:08Z

Because Parsec does not come with GHC?

ttuegel · 2015-04-24T17:19:53Z

Sorry, now I'm completely lost. The situation with binary is, we must bundle our own version so that users of GHC < 7.8 don't have to download and install it themselves. But, you're saying the situation with Parsec is, we don't have to bundle our own version because users can download and install it themselves?

23Skidoo · 2015-04-24T17:55:18Z

If we make Cabal depend on Parsec users will have to download it themselves. Waiting N years won't help because Parsec is not going to be included with GHC until Cabal starts depending on it. The workaround for binary makes sense because binary is included with GHC and we want to minimise the amount of preliminary work required for building Cabal. Though if we'll add a bootstrap script that will be less of an issue.

If you say that we should package Parsec with Cabal, then I think that it's a bad idea, because unlike binary, we'll basically have to add the whole Parsec library to our tree, and we won't be able to conditionally switch to a newer version if it's installed, like we do with binary.

ttuegel · 2015-04-24T19:21:21Z

If you say that we should package Parsec with Cabal, then I think that it's a bad idea, because unlike binary, we'll basically have to add the whole Parsec library to our tree, and we won't be able to conditionally switch to a newer version if it's installed, like we do with binary.

No, I would never suggest that. I was very uneasy about including binary--I gave serious thought to reverting my binary persistent build configuration patches because of the need to include it. In the end, I decided I preferred (very slightly) not to wait three years. But, when we depend on Parsec--which we absolutely must do, and as soon as possible--our users will complain that Cabal is not buildable with an out-of-the-box, fresh GHC install.

I think the best way forward on this front is to start distributing "super-sdist" tarballs with Cabal and its dependencies so that if cabal-install is unavailable, there is a known-good set of our dependencies available. There was a ticket around here somewhere to add a command for that, which I intended to work on for the next release, but now I can't find it for the life of me.

23Skidoo · 2015-04-24T19:35:40Z

I think the best way forward on this front is to start distributing "super-sdist" tarballs with Cabal and its dependencies so that if cabal-install is unavailable, there is a known-good set of our dependencies available.

Agreed, but such über-packages will need to be buildable with old versions of Cabal for this to work.

gbaz · 2015-04-28T05:14:12Z

If only portions of the parsing are deeply terribly, do we really need to move to parsec, or can we just hand-roll some recursive descent stuff just for this case?

ttuegel · 2015-04-28T11:24:44Z

If only portions of the parsing are deeply terribly, do we really need to move to parsec, or can we just hand-roll some recursive descent stuff just for this case?

I don't think the problem here is actually the size of the build-depends field, but rather the number of package descriptions that must be parsed. The parser we have now is ReadP-based; there isn't much we can do to salvage its performance. Anyway, the reason I'm pushing for the parsec parser is (if I understand correctly) it's already written, and just waiting to be merged.

DemiMarie · 2015-09-13T18:37:53Z

Another option is to use the Happy parser generator (used by GHC itself). Since Happy generates Haskell code, the generated sources could be distributed.

hvr · 2017-08-15T09:20:09Z

Fyi, the just merged #4654 has had a tremendous effect on heap allocations, here's a totally unfair heap profile comparison doing cabal unpack acme-everything + cabal check before & after PR #4654

I think we can consider this issue fixed...? ;-)

PS: Here's the +RTS -s output:

No errors or warnings could be found in the package.
   2,106,220,656 bytes allocated in the heap
   4,030,194,928 bytes copied during GC
     604,489,760 bytes maximum residency (17 sample(s))
       2,357,216 bytes maximum slop
            1449 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2001 colls,     0 par    2.784s   2.781s     0.0014s    0.5176s
  Gen  1        17 colls,     0 par    0.009s   0.009s     0.0005s    0.0007s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time    0.881s  (  0.880s elapsed)
  GC      time    2.792s  (  2.790s elapsed)
  EXIT    time    0.038s  (  0.039s elapsed)
  Total   time    3.712s  (  3.710s elapsed)

  Alloc rate    2,389,855,720 bytes per MUT second

  Productivity  24.8% of total user, 24.8% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

vs

No errors or warnings could be found in the package.
     314,662,704 bytes allocated in the heap
      51,218,184 bytes copied during GC
       7,010,960 bytes maximum residency (8 sample(s))
          66,928 bytes maximum slop
              16 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       298 colls,     0 par    0.064s   0.064s     0.0002s    0.0081s
  Gen  1         8 colls,     0 par    0.002s   0.002s     0.0002s    0.0005s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    0.486s  (  0.485s elapsed)
  GC      time    0.065s  (  0.065s elapsed)
  EXIT    time    0.001s  (  0.009s elapsed)
  Total   time    0.553s  (  0.560s elapsed)

  Alloc rate    647,749,351 bytes per MUT second

  Productivity  88.0% of total user, 88.2% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

23Skidoo · 2017-08-15T10:34:52Z

Closing as fixed by #4654.

ttuegel added type: bug blocked: info-needed labels Apr 23, 2015

ttuegel added Cabal: other and removed blocked: info-needed labels Apr 23, 2015

ttuegel added this to the Cabal-1.24 milestone Apr 23, 2015

grayjay added the Cabal: parser label Feb 8, 2016

23Skidoo modified the milestones: Cabal 1.24, Cabal 1.26 Feb 21, 2016

ezyang modified the milestone: Cabal 2.0 Sep 6, 2016

23Skidoo closed this as completed Aug 15, 2017

cabal goes crazy on large Build-Depends lists #2548

cabal goes crazy on large Build-Depends lists #2548

Comments

amigalemming commented Apr 22, 2015

ttuegel commented Apr 23, 2015

Uh oh!

amigalemming commented Apr 23, 2015

Uh oh!

ttuegel commented Apr 23, 2015

Uh oh!

23Skidoo commented Apr 23, 2015

Uh oh!

amigalemming commented Apr 24, 2015

Uh oh!

ttuegel commented Apr 24, 2015

Uh oh!

amigalemming commented Apr 24, 2015

Uh oh!

ttuegel commented Apr 24, 2015

Uh oh!

ttuegel commented Apr 24, 2015

Uh oh!

23Skidoo commented Apr 24, 2015

Uh oh!

ttuegel commented Apr 24, 2015

Uh oh!

23Skidoo commented Apr 24, 2015

Uh oh!

ttuegel commented Apr 24, 2015

Uh oh!

23Skidoo commented Apr 24, 2015

Uh oh!

gbaz commented Apr 28, 2015

Uh oh!

ttuegel commented Apr 28, 2015

Uh oh!

DemiMarie commented Sep 13, 2015

Uh oh!

hvr commented Aug 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

23Skidoo commented Aug 15, 2017

Uh oh!

hvr commented Aug 15, 2017 •

edited

Loading