-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Reduce Linq usage in FileSystemGlobbing #98109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since GetResultsInFullPath uses a struct (FilePatternMatch) in a Linq call, this is causing extra disk space on native AOT'd applications because of struct specialization. Remove the Linq call and instead use a simple loop to create the result.
Tagging subscribers to this area: @dotnet/area-extensions-filesystem Issue DetailsSince Using the following program: using Microsoft.Extensions.FileSystemGlobbing;
var matcher = new Matcher();
matcher.AddIncludePatterns(["*.txt", "*.asciidoc", "*.md"]);
string searchDirectory = @"C:\temp";
foreach (string file in matcher.GetResultsInFullPath(searchDirectory))
{
Console.WriteLine(file);
}
The remaining System.Linq usages in the library don't seem beneficial to remove: runtime/src/libraries/Microsoft.Extensions.FileSystemGlobbing/src/InMemoryDirectoryInfo.cs Line 48 in 47f325e
runtime/src/libraries/Microsoft.Extensions.FileSystemGlobbing/src/PatternMatchingResult.cs Line 20 in 47f325e
|
IEnumerable<FilePatternMatch> matches = patternMatchingResult.Files; | ||
List<string> result = matches is ICollection matchCollection ? new(matchCollection.Count) : new(); | ||
foreach (FilePatternMatch match in matches) | ||
{ | ||
result.Add(Path.GetFullPath(Path.Combine(directoryPath, match.Path))); | ||
} | ||
return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% confident this is better. But it does reduce the size on disk. I'm assuming it also reduces JIT cost and in-memory code size as well since we no longer need to specialize the Linq code for this struct.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the ToArray/ToList needed at all? Are we expecting consumers to iterate it multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the ToArray/ToList needed at all?
I was trying to keep the behavior as close to the existing as possible. I assume you are suggesting possibly just using a yield return
enumerator instead?
Are we expecting consumers to iterate it multiple times?
I don't think multiple times, but a common pattern I see (a little less than 50% from doing a few sample checks) is to call .ToList()
or .ToImmutableArray()
on the result of this method call. Having the result of this method return something that implements ICollection
/Count
seems to be beneficial in those cases.
examples:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you are suggesting possibly just using a yield return enumerator instead?
Or just returning the Select, in which case someone calling ToList on it will pick up Select's implementation of it. I'm not sure which aspect of your change was the focus and whether your explicit goal was to remove the Select.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
IEnumerable<FilePatternMatch> matches = patternMatchingResult.Files; | ||
List<string> result = matches is ICollection matchCollection ? new(matchCollection.Count) : new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matches is ICollection
ICollection<T>
doesn't inherit ICollection
; many of our collection types implement both, but it's not guaranteed. Is this just a "good enough" thing because we expect most inputs would be e.g. T[]
or List<T>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judging by the source, matches
will always be a List<T>
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eerhardt is my understanding correct? If so, to reduce complexity a bit, can we remove the new()
fallback and maybe add an "is ICollection" assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will merge the change without this since its still valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed the original question.
Since this is a public API and Matcher.Execute
is virtual
, the PatternMatchingResult
can have any IEnumerable<FilePatternMatch> Files
set on it. It isn't guaranteed to always be a List<T>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a "good enough" thing because we expect most inputs would be e.g. T[] or List?
Yes, our implementation always returns a List<T>
, so we typically expect this to be List<T>
in most cases.
* Reduce Linq usage in FileSystemGlobbing Since GetResultsInFullPath uses a struct (FilePatternMatch) in a Linq call, this is causing extra disk space on native AOT'd applications because of struct specialization. Remove the Linq call and instead use a simple loop to create the result. * Use ToArray instead of ToList to reduce unnecessary allocations. --------- Co-authored-by: David Cantú <[email protected]>
Since
GetResultsInFullPath
uses a struct (FilePatternMatch
) in a Linq call, this is causing extra disk space on native AOT'd applications because of struct specialization. Remove the Linq call and instead use a simple loop to create the result.Using the following program:
win-x64
size on disk:The remaining System.Linq usages in the library don't seem beneficial to remove:
runtime/src/libraries/Microsoft.Extensions.FileSystemGlobbing/src/InMemoryDirectoryInfo.cs
Line 48 in 47f325e
runtime/src/libraries/Microsoft.Extensions.FileSystemGlobbing/src/PatternMatchingResult.cs
Line 20 in 47f325e