Skip to content

Garbage Collect - Unexpected Error - Assertion Error - Troubles with changing dvc remotes? #4062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rabefabi opened this issue Jun 17, 2020 · 2 comments
Labels
awaiting response we are waiting for your reply, please respond! :)

Comments

@rabefabi
Copy link

Bug Report

Please provide information about your setup

Output of dvc version:

$ dvc version
DVC version: 0.94.0
Python version: 3.7.6
Platform: Linux-4.15.0-99-generic-x86_64-with-debian-buster-sid
Binary: False
Package: pip
Supported remotes: http, https, s3, ssh
Cache: reflink - not supported, hardlink - supported, symlink - supported
Repo: dvc, git

Additional Information (if any):

I tried running the garbage collection for the first time in my project and am encountering the following error.

(alvuc) rabefabi@9695e3ffdf02:~/alvuc$ dvc status                                                                           
Data and pipelines are up to date.                                      
(alvuc) rabefabi@9695e3ffdf02:~/alvuc$ dvc gc -aTv
2020-06-17 09:20:15,856 WARNING: This will remove all cache except items used in the working tree and all git branches and tags of the current repo.
Are you sure you want to proceed? [y/n] y
2020-06-17 09:20:21,135 DEBUG: PRAGMA user_version;                     
2020-06-17 09:20:21,136 DEBUG: fetched: [(3,)]
2020-06-17 09:20:21,136 DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
2020-06-17 09:20:21,137 DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
2020-06-17 09:20:21,137 DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
2020-06-17 09:20:21,137 DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
2020-06-17 09:20:21,138 DEBUG: PRAGMA user_version = 3;
2020-06-17 09:20:30,448 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/11/127bb35fbb6b37c554139ae5cb3c35.dir' is unchanged since it is read-only
2020-06-17 09:20:30,449 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/11/127bb35fbb6b37c554139ae5cb3c35.dir' is unchanged since it is read-only
2020-06-17 09:20:31,259 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/03/4a5618f598d1602e8181c635409292.dir' is unchanged since it is read-only
2020-06-17 09:20:31,259 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/03/4a5618f598d1602e8181c635409292.dir' is unchanged since it is read-only
2020-06-17 09:20:31,267 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/94/2889f6d61e03865be5ebd66518db3e.dir' is unchanged since it is read-only
2020-06-17 09:20:31,268 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/94/2889f6d61e03865be5ebd66518db3e.dir' is unchanged since it is read-only
2020-06-17 09:20:31,268 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/e9/312f039663dcbc363e03dad8a1bf4c.dir' is unchanged since it is read-only
2020-06-17 09:20:31,268 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/e9/312f039663dcbc363e03dad8a1bf4c.dir' is unchanged since it is read-only
2020-06-17 09:20:31,588 DEBUG: SELECT count from state_info WHERE rowid=?
2020-06-17 09:20:31,589 DEBUG: fetched: [(180911,)]
2020-06-17 09:20:31,589 DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
2020-06-17 09:20:31,599 ERROR: unexpected error
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/main.py", line 49, in main
    ret = cmd.run()
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/command/gc.py", line 59, in run
    workspace=self.args.workspace,
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/repo/__init__.py", line 30, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/repo/gc.py", line 73, in gc
    jobs=jobs,
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/repo/__init__.py", line 295, in used_cache
    filter_info=filter_info,
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/stage/__init__.py", line 761, in get_used_cache
    cache.update(out.get_used_cache(*args, **kwargs))
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/output/base.py", line 449, in get_used_cache
    self.checksum, self._collect_used_dir_cache(**kwargs),
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/output/base.py", line 363, in _collect_used_dir_cache
    if self.cache.changed_cache_file(self.checksum):
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/remote/base.py", line 805, in changed_cache_file
    if self.is_protected(cache_info):
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/remote/local.py", line 719, in is_protected
    if not self.exists(path_info):
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/remote/local.py", line 97, in exists
    assert is_working_tree(self.repo.tree)
AssertionError
------------------------------------------------------------

The debug lines inform me that some directories are read-only, and it's quite possible that I botched something during my server setup (Jupyter Lab inside docker container accessing host-directories via a docker bind). I however manually tried setting all access rights for the cache dir, which did not alleviate the error:

(alvuc) rabefabi@9695e3ffdf02:~/alvuc$ sudo chmod -R g+w .dvc/cache/
(alvuc) rabefabi@9695e3ffdf02:~/alvuc$ ls -la .dvc/cache/
total 12496
drwxrwxr-x 258 rabefabi users   4096 Jun 17 09:27 .
drwxr-xr-x   4 rabefabi users   4096 Jun 17 09:31 ..
drwxrwxr-x   2 rabefabi users  20480 Jun 17 09:19 00
drwxrwxr-x   2 rabefabi users  24576 Jun 17 09:19 01
drwxrwxr-x   2 rabefabi users  20480 Jun 17 09:19 02
drwxrwxr-x   3 rabefabi users  20480 Jun 17 09:19 03
...

Also, I recently switched dvc remotes (from S3 to ssh), so most of the cached files were created during a time where the S3-Remote was configured. Could this be relevant? The S3-Remote is no longer accessible.

Thanks in advance!

@ghost ghost added the triage Needs to be triaged label Jun 17, 2020
@skshetry
Copy link
Collaborator

Hi @rabefabi, thanks for reporting. This issue has been fixed and we are preparing for 1.0 release within this week with the fix. But, you should be able to use a beta release without any issues for the time being.

Related: #3857 #3812

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Jun 17, 2020
@ghost ghost removed the triage Needs to be triaged label Jun 17, 2020
@rabefabi
Copy link
Author

Hi @skshetry , thank you for the quick response.

Upgrading did solve the issue, if anyone else encounters it:

(alvuc) rabefabi@9695e3ffdf02:~/alvuc$ pip install --upgrade --pre dvc
[...]
(alvuc) rabefabi@9695e3ffdf02:~/alvuc$ dvc gc -aT
[garbage collection happens]

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :)
Projects
None yet
Development

No branches or pull requests

3 participants