Skip to content

refactor: consolidate tree classes #4050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pmrowla opened this issue Jun 16, 2020 · 1 comment
Closed

refactor: consolidate tree classes #4050

pmrowla opened this issue Jun 16, 2020 · 1 comment
Assignees
Labels
p3-nice-to-have It should be done this or next sprint refactoring Factoring and re-factoring

Comments

@pmrowla
Copy link
Contributor

pmrowla commented Jun 16, 2020

Follow up to #3882.

Currently we have tree classes that all inherit from dvc.scm.tree.BaseTree spread out all over the code base w/DvcTree, RepoTree, SCM/git related trees, BaseRemoteTree, and all of the associated cloud remote trees. It may make more sense to group these in something like dvc.tree and dvc.tree.cloud. This would also allow us to simplify what is currently under dvc.remote, since we really only need single files for Remote and CloudCache now. (Cloud cache code could also potentially just be moved into dvc.cache now as well.)

@pmrowla pmrowla added refactoring Factoring and re-factoring p3-nice-to-have It should be done this or next sprint labels Jun 16, 2020
@efiop efiop assigned efiop and unassigned efiop Jun 22, 2020
efiop added a commit to efiop/dvc that referenced this issue Jun 22, 2020
GitPython is not threadsafe, which was causing issues when we were
computing hash for a directory.

Fixes iterative#4079

Related to tree generalization iterative#4050
efiop added a commit that referenced this issue Jun 22, 2020
GitPython is not threadsafe, which was causing issues when we were
computing hash for a directory.

Fixes #4079

Related to tree generalization #4050
@efiop efiop self-assigned this Jul 14, 2020
efiop added a commit to efiop/dvc that referenced this issue Jul 14, 2020
Outputs and dependencies don't really care about the remotes, as they
don't do any pushing or pulling. What they really need are the trees,
so let's use them directly.

Part of iterative#4050
efiop added a commit that referenced this issue Jul 15, 2020
Outputs and dependencies don't really care about the remotes, as they
don't do any pushing or pulling. What they really need are the trees,
so let's use them directly.

Part of #4050
efiop added a commit to efiop/dvc that referenced this issue Jul 15, 2020
efiop added a commit that referenced this issue Jul 15, 2020
* dvc: remove tree-specific wrappers from remote

This helps combat the remote/tree/cache confusion.

* dvc: split remotes from trees

Related to #4050
@efiop efiop mentioned this issue Jul 16, 2020
3 tasks
efiop added a commit to efiop/dvc that referenced this issue Jul 16, 2020
Working tree is really just a regular local tree and should be
used by outputs when trying to compute a hash for themselves.
We didn't use it previously because local tree was embeded into
the local remote class.

Related to iterative#4050
efiop added a commit to efiop/dvc that referenced this issue Jul 16, 2020
Working tree is really just a regular local tree and should be
used by outputs when trying to compute a hash for themselves.
We didn't use it previously because local tree was embeded into
the local remote class.

Related to iterative#4050
efiop added a commit that referenced this issue Jul 16, 2020
Working tree is really just a regular local tree and should be
used by outputs when trying to compute a hash for themselves.
We didn't use it previously because local tree was embeded into
the local remote class.

Related to #4050
@efiop efiop mentioned this issue Jul 16, 2020
3 tasks
efiop added a commit to efiop/dvc that referenced this issue Jul 19, 2020
@efiop efiop mentioned this issue Jul 19, 2020
3 tasks
efiop added a commit to efiop/dvc that referenced this issue Jul 19, 2020
efiop added a commit that referenced this issue Jul 19, 2020
efiop added a commit to efiop/dvc that referenced this issue Jul 19, 2020
efiop added a commit that referenced this issue Jul 19, 2020
@efiop
Copy link
Contributor

efiop commented Jul 19, 2020

Ok, I think we are done for now. Core code is separated, but there might be some issues like #4244 which were there before this, but we weren't able to see them. We'll have to fix them as we go, especially since we are trying to use trees more and more and those bugs are getting more and more obvious.

I didn't move the tests around for now, because they are intermixed and it doesn't seem worth it to bother with in this sprint. But it is a technical debt, so we'll have to get to them at some point in the future. Luckily, they are easy to separate when writing new tests (similar how we are doing with unit -> pytest migration). Plus, cache and remote classes still have some cloud-specific implementations (e.g. local and ssh where we use some sftp channel magic) which will likely be slimmed down once we unify tree interfaces (see comment about it below).

Next big thing here is to, of course, optimize dvcignore (something that is on our radar right now, has a separate ticket and PR), but , even more importantly, we will need to unify interfaces of our tree classes better. E.g. make them all raise appropriate OSErrors (or something else if we will be able to find something better) so we are able to handle them in a unified way. Same with return types, where it would be great to use type hints and use mypy to check them.

@efiop efiop closed this as completed Jul 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p3-nice-to-have It should be done this or next sprint refactoring Factoring and re-factoring
Projects
None yet
Development

No branches or pull requests

2 participants