WIP: correction cashe refactor init #715

mitya52 · 2025-04-28T10:43:56Z

files, dirs caches based on trie data structure

2x less memory
3x faster build

todo:

slow and memory inefficient tree
dirs cache over all subdirs
fuzzy search possible improvements
top n optimizations in find matches and so on
tests
windows!!!

humbertoyusta · 2025-04-29T12:44:18Z

refact-agent/engine/src/files_correction_cache.rs

+    // NOTE: this is a hack for fuzzy_search only.
+    // The algorithm iterates over all unique_paths.
+    // I'm sure we can find better way to implement it.
+    unique_paths: HashSet<Vec<usize>>,


This unique paths, for fuzzy search, I think are meant to be like the previous cache_shortened, to do fuzzy search over them, we'll need them to be AT LEAST the length from workspace folder, for example, if I have opened my project in /home/user/work/ and I have a file /home/user/work/dir1/file.ext, it should never be shortened to just file.ext but to dir1/file.ext

And to implement it, I guess we could just mark some nodes, with a boolean flag, that will be true if this is the end of one of those shortened paths, like dir1/file.ext, we can mark them after the build, by finding them in the trie until the count is 1, and we're deep enough such that we crop equal or less than the workspace folder.

Then iterating through those ones could be iterating through the trie and retrieving the marked ones

but there may be a simpler implementation for this, but this one will work and is not super complex I guess

humbertoyusta · 2025-04-29T12:55:04Z

refact-agent/engine/src/files_correction_cache.rs

+            .map(|comp| comp.as_os_str().to_string_lossy().to_string())
+            .collect();
+
+        for i in (0..components.len()).rev() {


Here same thing, we shouldn't crop more than the workspace folder, not sure if this handles that well

humbertoyusta · 2025-04-29T12:58:49Z

refact-agent/engine/src/files_correction.rs

    // it's dangerous to use cache_correction_arc without a mutex, but should be fine as long as it's read-only
    // (another thread never writes to the map itself, it can only replace the arc with a different map)

-    if let Some(fixed) = (*cache_correction_arc).get(&correction_candidate.clone()) {
-        return fixed.into_iter().cloned().collect::<Vec<String>>();
+    // NOTE: do we need top_n here?


top_n is mostly for fuzzy seach limit I guess, we could assume not that many files will match for the non fuzzy case, but maybe they will, so not sure, maybe we should top_n both? but I think we somehow handle ... and n files more somewhere, not sure if it applies here

humbertoyusta and others added 2 commits April 26, 2025 00:17

Byok rework (#611)

860578a

correction chash refactor init

ac2db4d

mitya52 requested a review from humbertoyusta April 28, 2025 10:43

mitya52 added 4 commits April 28, 2025 14:44

shortify path using trie

12f2d57

unified correct nearest

27f57ae

dont store paths pt. 1

7b349ff

lazy unique_paths for fuzzy search

701975b

humbertoyusta requested changes Apr 29, 2025

View reviewed changes

MarcMcIntosh force-pushed the dev branch from ad30d99 to dfccd65 Compare April 29, 2025 15:39

mitya52 changed the title ~~WIP: correction chash refactor init~~ WIP: correction cashe refactor init Apr 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: correction cashe refactor init #715

WIP: correction cashe refactor init #715

mitya52 commented Apr 28, 2025 •

edited

Loading

humbertoyusta Apr 29, 2025

humbertoyusta Apr 29, 2025

humbertoyusta Apr 29, 2025

humbertoyusta Apr 29, 2025

humbertoyusta Apr 29, 2025

WIP: correction cashe refactor init #715

Are you sure you want to change the base?

WIP: correction cashe refactor init #715

Conversation

mitya52 commented Apr 28, 2025 • edited Loading

humbertoyusta Apr 29, 2025

Choose a reason for hiding this comment

humbertoyusta Apr 29, 2025

Choose a reason for hiding this comment

humbertoyusta Apr 29, 2025

Choose a reason for hiding this comment

humbertoyusta Apr 29, 2025

Choose a reason for hiding this comment

humbertoyusta Apr 29, 2025

Choose a reason for hiding this comment

mitya52 commented Apr 28, 2025 •

edited

Loading