Skip to content

WIP: correction cashe refactor init #715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: dev
Choose a base branch
from

Conversation

mitya52
Copy link
Member

@mitya52 mitya52 commented Apr 28, 2025

files, dirs caches based on trie data structure

  • 2x less memory
  • 3x faster build

todo:

  • slow and memory inefficient tree
  • dirs cache over all subdirs
  • fuzzy search possible improvements
  • top n optimizations in find matches and so on
  • tests
  • windows!!!

@mitya52 mitya52 requested a review from humbertoyusta April 28, 2025 10:43
// NOTE: this is a hack for fuzzy_search only.
// The algorithm iterates over all unique_paths.
// I'm sure we can find better way to implement it.
unique_paths: HashSet<Vec<usize>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unique paths, for fuzzy search, I think are meant to be like the previous cache_shortened, to do fuzzy search over them, we'll need them to be AT LEAST the length from workspace folder, for example, if I have opened my project in /home/user/work/ and I have a file /home/user/work/dir1/file.ext, it should never be shortened to just file.ext but to dir1/file.ext

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to implement it, I guess we could just mark some nodes, with a boolean flag, that will be true if this is the end of one of those shortened paths, like dir1/file.ext, we can mark them after the build, by finding them in the trie until the count is 1, and we're deep enough such that we crop equal or less than the workspace folder.

Then iterating through those ones could be iterating through the trie and retrieving the marked ones

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but there may be a simpler implementation for this, but this one will work and is not super complex I guess

.map(|comp| comp.as_os_str().to_string_lossy().to_string())
.collect();

for i in (0..components.len()).rev() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here same thing, we shouldn't crop more than the workspace folder, not sure if this handles that well

// it's dangerous to use cache_correction_arc without a mutex, but should be fine as long as it's read-only
// (another thread never writes to the map itself, it can only replace the arc with a different map)

if let Some(fixed) = (*cache_correction_arc).get(&correction_candidate.clone()) {
return fixed.into_iter().cloned().collect::<Vec<String>>();
// NOTE: do we need top_n here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

top_n is mostly for fuzzy seach limit I guess, we could assume not that many files will match for the non fuzzy case, but maybe they will, so not sure, maybe we should top_n both? but I think we somehow handle ... and n files more somewhere, not sure if it applies here

@mitya52 mitya52 changed the title WIP: correction chash refactor init WIP: correction cashe refactor init Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants