Skip to content

tree: use trie in DvcTree (or in Repo) #4847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
efiop opened this issue Nov 5, 2020 · 1 comment
Closed

tree: use trie in DvcTree (or in Repo) #4847

efiop opened this issue Nov 5, 2020 · 1 comment
Labels
p2-medium Medium priority, should be done, but less important performance improvement over resource / time consuming tasks refactoring Factoring and re-factoring

Comments

@efiop
Copy link
Contributor

efiop commented Nov 5, 2020

We are only using outputs there and we have the metadata now, so it makes sense to just construct the trie from the repo dag and have a nice and convenient way of working with it internally instead of doing dag lookups. Potentially worth doing the same thing in the repo itself, as trie is often handier than dag (though we are already using the tree(not trie) in such places). One repo trie with both outs and deps might be a solution too.

This became relevant while working on abstracting dir_info away into itsown class and analysing the way we do DvcTree.walk() now (we use trie there already).

@efiop efiop added refactoring Factoring and re-factoring p2-medium Medium priority, should be done, but less important labels Nov 5, 2020
@efiop efiop changed the title tree: use trie in DvcTree tree: use trie in DvcTree (or in Repo) Nov 5, 2020
@efiop efiop added the performance improvement over resource / time consuming tasks label Nov 5, 2020
efiop added a commit to efiop/dvc that referenced this issue Nov 7, 2020
Currently we are converting dir_info to/from lists all the time.
The reason is that dir_info is stored as list of dicts in *.dir files,
but that makes it hard to work with. In addition to that, we will likely
be changing .dir file format in the near future iterative#829, so we need to
abstract away dir_info into something that we won't care how it will be
stored on disk.

Related iterative#3256
Related iterative#4847
efiop added a commit that referenced this issue Nov 7, 2020
Currently we are converting dir_info to/from lists all the time.
The reason is that dir_info is stored as list of dicts in *.dir files,
but that makes it hard to work with. In addition to that, we will likely
be changing .dir file format in the near future #829, so we need to
abstract away dir_info into something that we won't care how it will be
stored on disk.

Related #3256
Related #4847
@efiop
Copy link
Contributor Author

efiop commented May 3, 2021

Already using repo.outs_trie

@efiop efiop closed this as completed May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p2-medium Medium priority, should be done, but less important performance improvement over resource / time consuming tasks refactoring Factoring and re-factoring
Projects
None yet
Development

No branches or pull requests

1 participant