-
Notifications
You must be signed in to change notification settings - Fork 1.2k
run: run is computing checksums even though --no-exec is specified #5368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @Honzys ! I'm not able to reproduce the issue with 1.11.13 or newer dvc. I see that the commands you've provided are not the actual commands that you've used (e.g. there is no command in the |
@efiop Thank you for your response. I am sorry, I wrote the reproducer as a "pseudocode" just to show the basic idea. Unfortunately I cannot create exact reproducer to this issue (since I cannot provide you the dependency folder). But I can show you some other info. When I run the $ dvc run --verbose -n some_stage --deps folder_on_nas --wdir . --no-exec --force whatever_command_we_want
2021-02-01 19:36:15,560 DEBUG: Check for update is enabled.
2021-02-01 19:36:15,573 DEBUG: fetched: [(3,)]
2021-02-01 19:36:15,598 DEBUG: Assuming '/opt/dvc_cache/d3/a47aba9c5976a1635bce713740ac31' is unchanged since it is read-only
2021-02-01 19:36:15,599 DEBUG: Path '/home/dev/output.h5' inode '5980356654'
2021-02-01 19:36:15,599 DEBUG: fetched: [('1612034172963527168', '4917983', 'd3a47aba9c5976a1635bce713740ac31', '1612207342449105664')]
^C2021-02-01 19:38:03,992 DEBUG: fetched: [(76,)]
2021-02-01 19:38:04,026 ERROR: interrupted by the user
------------------------------------------------------------
Traceback (most recent call last):
File "/opt/venv/lib/python3.6/site-packages/dvc/main.py", line 90, in main
ret = cmd.run()
File "/opt/venv/lib/python3.6/site-packages/dvc/command/run.py", line 60, in run
desc=self.args.desc,
File "/opt/venv/lib/python3.6/site-packages/dvc/repo/__init__.py", line 54, in wrapper
return f(repo, *args, **kwargs)
File "/opt/venv/lib/python3.6/site-packages/dvc/repo/scm_context.py", line 4, in run
result = method(repo, *args, **kw)
File "/opt/venv/lib/python3.6/site-packages/dvc/repo/run.py", line 117, in run
if kwargs.get("run_cache", True) and stage.can_be_skipped:
File "/opt/venv/lib/python3.6/site-packages/dvc/stage/__init__.py", line 387, in can_be_skipped
if self.is_cached and not self.is_callback and not self.always_changed:
File "/opt/venv/lib/python3.6/site-packages/dvc/stage/__init__.py", line 681, in is_cached
return self.name in self.dvcfile.stages and super().is_cached
File "/opt/venv/lib/python3.6/site-packages/dvc/stage/__init__.py", line 405, in is_cached
self.save_deps()
File "/opt/venv/lib/python3.6/site-packages/dvc/stage/__init__.py", line 443, in save_deps
dep.save()
File "/opt/venv/lib/python3.6/site-packages/dvc/output/base.py", line 276, in save
self.hash_info = self.get_hash()
File "/opt/venv/lib/python3.6/site-packages/dvc/output/base.py", line 186, in get_hash
return self.tree.get_hash(self.path_info)
File "/opt/venv/lib/python3.6/site-packages/funcy/decorators.py", line 39, in wrapper
return deco(call, *dargs, **dkwargs)
File "/opt/venv/lib/python3.6/site-packages/dvc/tree/base.py", line 45, in use_state
return call()
File "/opt/venv/lib/python3.6/site-packages/funcy/decorators.py", line 60, in __call__
return self._func(*self._args, **self._kwargs)
File "/opt/venv/lib/python3.6/site-packages/dvc/tree/base.py", line 271, in get_hash
hash_info = self.state.get(path_info)
File "/opt/venv/lib/python3.6/site-packages/dvc/state.py", line 446, in get
actual_mtime, actual_size = get_mtime_and_size(path, self.tree)
File "/opt/venv/lib/python3.6/site-packages/dvc/utils/fs.py", line 40, in get_mtime_and_size
stats = tree.stat(file_path)
File "/opt/venv/lib/python3.6/site-packages/dvc/tree/local.py", line 162, in stat
return os.stat(path)
KeyboardInterrupt
------------------------------------------------------------
2021-02-01 19:38:04,062 DEBUG: Analytics is enabled.
2021-02-01 19:38:04,121 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpl552fx8n']'
2021-02-01 19:38:04,123 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpl552fx8n']' As you can see, it is stucked for quite some time on computing hash (I guess for the dependencies provided), till I interrupted it. Since the dependency is folder on our NAS with a lot of small images it should take quite some time to compute checksum for it. But the question is - Should the hash be computed even though the Shouldn't this line https://github.com/iterative/dvc/blob/1.11.13/dvc/repo/run.py#L117 if kwargs.get("run_cache", True) and stage.can_be_skipped: to if not no_exec and kwargs.get("run_cache", True) and stage.can_be_skipped: If I run the exact same stage with $ dvc run --verbose -n some_stage --deps folder_on_nas --wdir . --no-exec --no-run-cache --force whatever_command_we_want
2021-02-01 19:53:33,687 DEBUG: Check for update is enabled.
2021-02-01 19:53:33,700 DEBUG: fetched: [(3,)]
Modifying stage 'some_stage' in 'dvc.yaml'
To track the changes with git, run:
git add dvc.yaml
2021-02-01 19:53:34,154 DEBUG: fetched: [(76,)]
2021-02-01 19:53:34,190 DEBUG: Analytics is enabled.
2021-02-01 19:53:34,284 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp_4thwkz3']'
2021-02-01 19:53:34,286 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp_4thwkz3']' IMHO checking the hashes are unnecessary since the stage shouldn't be actually ran, because Please let me know if you need more info on this, I am not sure that I described the issue clearly. |
Thank you @Honzys ! That makes sense! That's an old piece of legacy caching that we were meaning to get rid of. We'll bypass it in 1.11.x and will likely get rid of it in 2.0 soon. Thank you for the feedback! 🙏 |
@Honzys Btw, would you like to submit a PR for 1.11 branch with that |
Yes, I can submit a PR. |
Prevent computing hashes of dependencies when no_exec is set. Fixes iterative#5368
Prevent computing hashes of dependencies when no_exec is set. Fixes #5368 Co-authored-by: Jan Stratil <[email protected]>
Uh oh!
There was an error while loading. Please reload this page.
Bug Report
run: run cache is not ignored when --no-exec
Description
When you specify
--no-exec
param, dvc run command should be treated asi if the--no-run-cache
was also specified. I guess it makes no sense to check the cache when--no-exec
is toggled, or am I wrong?The problem is when I run
dvc run --no-exec ...
with dependency pointing to a large folder, it takes a long time to execute this command. Also after the stage is created and I actually want to run the stage usingdvc repro
it takes a long time to execute that command too.Our use-case is that we generate all the stages with
--no-exec
param before we run it for real. If we somehow change the pipeline we would rerun the "generation of all stages with--no-exec
param" and then again run it for real after it's refreshed.Would it makes sense to set the
--no-run-cache
toTrue
when--no-exec
is specified? Or am I overlooking something?Thank you very much !
Reproduce
Example:
dvc run --no-exec --force --deps folder_path -n stage
# This should be done in instancedvc repro stage
# This should take some time to compute checksum for dependenciesdvc run --no-exec --force --deps folder_path -n stage
# This should be done in instance (but it actually will take some time)Expected
I would expect that steps 2 and 4 would be completed aproximately in the same amount of time.
Environment information
Output of
dvc version
:The text was updated successfully, but these errors were encountered: