Skip to content

[nix-local-build] Recompilation avoidance ABA problem #3179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ezyang opened this issue Feb 21, 2016 · 11 comments
Open

[nix-local-build] Recompilation avoidance ABA problem #3179

ezyang opened this issue Feb 21, 2016 · 11 comments
Labels
cabal-install: nix-local-build old-milestone: ⊥ Moved from https://github.com/haskell/cabal/milestone/5 type: bug
Milestone

Comments

@ezyang
Copy link
Contributor

ezyang commented Feb 21, 2016

Steps to reproduce:

  1. Build a project with new-build
  2. Modify a file in the project
  3. Trigger a recompilation, but before GHC is done executing, C^C it
  4. Revert the modified file to its old version
  5. Trigger a recompilation

Expected result: Cabal recompiles

Actual result: Cabal reports that there is nothing to do.

Ostensibly, the problem is that before we start a recompile, we need to invalidate the old cached entry until the recompile finishes, so that we can handle an interrupt.

@dcoutts
Copy link
Contributor

dcoutts commented Apr 12, 2016

So if GHC had not modified any of the .o files then in principle saying there's nothing to do is correct since we are back to the old state. But the point is the interrupted build has now put the outputs into an inconsistent state. Perhaps we just want to blow away the cache if the action fails.

@ezyang
Copy link
Contributor Author

ezyang commented Apr 12, 2016

Yes, I think the cache should be removed. I'd be very surprised if, e.g., this wasn't what Shake does (haven't checked though.) (Maybe @ndmitchell knows.)

@ndmitchell
Copy link

Shake checks both the inputs and the outputs are "the same", so the first time the input has changed but the output hasn't. The second time round the output has changed but the input hasn't. Either way, it's a difference of opinion, so it rebuilds.

In addition, if Shake rebuilds a bit and fails it writes a failure entry into the log, basically deleting all information about that entry, so it must rebuild next time. If you hit kill -9 you might not get that entry in the log, but otherwise you probably will.

It sounds like both those solutions solve the problem, but the first is probably easier for Cabal to integrate.

@ezyang
Copy link
Contributor Author

ezyang commented Apr 12, 2016

Actually, I'd say the second is easier, because Cabal's current rebuild system doesn't know anything about outputs.

@ndmitchell
Copy link

Does Cabal's current rebuild system have any form of log about what has happened? Adding such a log can be difficult if it isn't there already.

@ezyang
Copy link
Contributor Author

ezyang commented Apr 12, 2016

No, it's a very simple system (I keep telling Duncan we should just use Shake, but there's a lot of legacy code so the current rebuild interface helped us avoid rewriting everything in Cabal). The interface looks something like this (the API of which, BTW, I find HELLA confusing):

-- | Filesystem based "keys" which we can monitor for changes, these are keys in Shake's model
data MonitorFilePath
monitorFile :: FilePath -> MonitorFilePath
monitorFileHashed :: FilePath -> MonitorFilePath
monitorNonExistentFile :: FilePath -> MonitorFilePath
monitorFileGlob :: FilePathGlob -> MonitorFilePath
-- etc...

-- | A monitor for detecting changes to a set of files. In the Shake-world, this is like
-- a non-wildcard rule, except that file monitors don't contain the code necessary to rerun the
-- rule (so unlike Shake, there is no database of rules; you just write old-fashioned
-- imperative code to do your full build). The dependencies of this "rule" are 'MonitorFilePath', plus
-- a Haskell key a. The result type of the rule is b. Every rule is associated with its own
-- unique file path which is where the state associated with that rule is cached. Note that
-- the FileMonitor is not a database of keys to values; the "key" is really just a Haskell
-- value which is also considered a dependency.
data FileMonitor a b
newFileMonitor :: Eq a => FilePath -> FileMonitor a b

-- | A monad for tracking 'MonitorFilePath's when building files.
data Rebuild a
instance Monad Rebuild
instance MonadIO Rebuild
monitorFiles :: [MonitorFilePath] -> Rebuild ()
runRebuild :: Rebuild a -> IO a
-- | Given a monitor, check if 'a' matches, and none of the recorded files in the monitor
-- have changed. If these are all unchanged, return the cached value 'b'. Otherwise,
-- rerun the 'Rebuild b' action.
rerunIfChanged :: (Binary a, Binary b) => FileMonitor a b -> a -> Rebuild b -> Rebuild b

@hvr
Copy link
Member

hvr commented Sep 26, 2016

I've got a related issue which I'm not sure if it's an instance of this one or a different issue:

Steps to reproduce; in Cabal's source-tree:

  1. run cabal new-build exe:cabal in a clean tree so that cabal considers exe:cabal up to date

  2. modify Cabal/Distribution/Simple.hs such that lib:Cabal needs rebuilding

  3. run cabal new-build exe:cabal -j1 and interrupt it while hackage-security is being built:

$ cabal new-build exe:cabal -j1
In order, the following will be built (use -v for more details):
 - Cabal-1.25.0.0 (lib) (file Distribution/Simple.hs changed)
 - hackage-security-0.5.2.2 (lib) (dependency rebuilt)
 - cabal-install-1.25.0.0 (exe:cabal) (dependency rebuilt)
Preprocessing library Cabal-1.25.0.0...
[114 of 114] Compiling Distribution.Simple ( Distribution/Simple.hs, /stuff3/work/GitHub/cabal/dist-newstyle/build/x86_64-linux/ghc-8.0.1/Cabal-1.25.0.0/build/Distribution/Simple.o )
Preprocessing library hackage-security-0.5.2.2...
[1 of 1] Compiling Main             ( /stuff3/work/GitHub/cabal/dist-newstyle/build/x86_64-linux/ghc-8.0.1/cabal-install-1.25.0.0/setup/setup.hs, /stuff3/work/GitHub/cabal/dist-newstyle/build/x86_64-linux/ghc-8.0.1/cabal-install-1.25.0.0/setup/Main.o ) [Distribution.Simple changed]
Linking /stuff3/work/GitHub/cabal/dist-newstyle/build/x86_64-linux/ghc-8.0.1/cabal-install-1.25.0.0/setup/setup ...
Preprocessing executable 'cabal' for cabal-install-1.25.0.0...
[ 67 of 126] Compiling Distribution.Client.Utils ( Distribution/Client/Utils.hs, /stuff3/work/GitHub/cabal/dist-newstyle/build/x86_64-linux/ghc-8.0.1/cabal-install-1.25.0.0/build/cabal/cabal-tmp/Distribution/Client/Utils.o ) [Distribution.Compat.Environment changed]
[ 68 of 126] Compiling Distribution.Client.FileMonitor ( Distribution/Client/FileMonitor.hs, /stuff3/work/GitHub/cabal/dist-newstyle/build/x86_64-linux/ghc-8.0.1/cabal-install-1.25.0.0/build/cabal/cabal-tmp/Distribution/Client/FileMonitor.o ) [Distribution.Compat.Time changed]
[ 69 of 126] Compiling Distribution.Client.RebuildMonad ( Distribution/Client/RebuildMonad.hs, /stuff3/work/GitHub/cabal/dist-newstyle/build/x86_64-linux/ghc-8.0.1/cabal-install-1.25.0.0/build/cabal/cabal-tmp/Distribution/Client/RebuildMonad.o ) [Distribution.Simple.Utils changed]
[ 70 of 126] Compiling Distribution.Client.Win32SelfUpgrade ( Distribution/Client/Win32SelfUpgrade.hs, /stuff3/work/GitHub/cabal/dist-newstyle/build/x86_64-linux/ghc-8.0.1/cabal-install-1.25.0.0/build/cabal/cabal-tmp/Distribution/Client/Win32SelfUpgrade.o ) [Distribution.Simple.Utils changed]
[ 71 of 126] Compiling Distribution.Client.Types ( Distribution/Client/Types.hs, /stuff3/work/GitHub/cabal/dist-newstyle/build/x86_64-linux/ghc-8.0.1/cabal-install-1.25.0.0/build/cabal/cabal-tmp/Distribution/Client/Types.o ) [Distribution.Simple.Utils changed]
^C
  1. repeat invocation from 3)
$ cabal new-build exe:cabal -j1
Up to date

where cabal falsely believes that exe:cabal is up to date

EDIT: the problem appears to be that new-build marks invalidated rev-deps which need rebuilding... but doesn't persist that to the filesystem; that way on the next run cabal lost the state about invalidated rev-deps or in @dcoutts' words:

the precise situation, A depends on B, edit B, rebuild A, abort after building B but while building A
which leaves the B build cache updated, and the A build cache unmodified, leading to the file cache check saying nothing has changed and the solution is the invalidation we do internally has to be persisted

@ezyang
Copy link
Contributor Author

ezyang commented Oct 1, 2016

Milestoned as a must-fix when we switch over to default; but I am not sure if we are going to manage to fix this for 2.0.

@hvr
Copy link
Member

hvr commented Oct 1, 2016

UPDATE I can't reproduce with the instructions below with a current cabal 2.5 snapshot; it appears that at some point the file tracking was extended to add the cache/build files as file-tracking deps; the original issue still stands though (i.e. the cache state not being invalidated if ghc terminates abnormaly)

Here's an even easier way to reproduce w/o using CTRL-C:

  1. and 2) ...as above...

  2. rebuild only the lib via cabal new-build lib:Cabal

$ cabal new-build lib:Cabal
In order, the following will be built (use -v for more details):
 - Cabal-1.25.0.0 (lib) (file Distribution/Package.hs changed)
Preprocessing library Cabal-1.25.0.0...
....
[117 of 117] Compiling Distribution.Simple ( Distribution/Simple.hs, /stuff3/work/GitHub/cabal/dist-newstyle/build/x86_64-linux/ghc-8.0.1/Cabal-1.25.0.0/build/Distribution/Simple.o ) [Distribution.Package changed]
  1. now ask to rebuild the cabal exe via cabal new-build exe:cabal
$ cabal new-build exe:cabal
Up to date

@dcoutts
Copy link
Contributor

dcoutts commented Oct 1, 2016

My vague idea is to solve it by extending the file monitor stuff with the ability to invalidate.

@fgaz
Copy link
Member

fgaz commented May 22, 2020

Unassigning this from myself since it was part of https://github.com/haskell/cabal/projects/7 and I'm not working on it now

@fgaz fgaz removed their assignment May 22, 2020
@emilypi emilypi removed the meta: hvr label Aug 11, 2021
@ulysses4ever ulysses4ever modified the milestones: Considered for 3.4, Jun 1, 2022
@andreabedini andreabedini added the old-milestone: ⊥ Moved from https://github.com/haskell/cabal/milestone/5 label Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cabal-install: nix-local-build old-milestone: ⊥ Moved from https://github.com/haskell/cabal/milestone/5 type: bug
Projects
None yet
Development

No branches or pull requests

10 participants