-
Notifications
You must be signed in to change notification settings - Fork 0
Introduce DenseVector<T> and convert one usage of VBuffer to it. #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/// is passed to a row cursor getter, the callee is free to take ownership of | ||
/// and re-use the backing data structure (Buffer). | ||
/// </summary> | ||
public readonly struct DenseVector<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this type? Why not use Memory directly? If you need to be able to resize it, could the caller just hold to the original buffer? Or the method filling in data into Memory would return out parameter specifying how many items were filled in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at https://github.com/dotnet/machinelearning/blob/master/docs/code/VBufferCareFeeding.md#buffer-re-use-as-a-user for the common usage of these vectors.
Why not use Memory directly?
Because if you only had Memory directly, as soon as you wanted to return a vector that had a length less than the total buffer capacity, now you could no longer get back to use the total capacity of the buffer.
could the caller just hold to the original buffer?
It's my understanding that the caller doesn't always know how big the original buffer should be, and thus can't create it.
Or the method filling in data into Memory would return out parameter specifying how many items were filled in.
That won't work with the current IDataView
GetGetter
delegate design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is the following (it's a pattern we use in lots of Span/Memory APIs):
Memory<byte> buffer = ...
while(true){
if(TryFillBuffer(buffer, out written)) {
UseFilledInBuffer(buffer.Slice(0, written));
break;
}
Enlarge(ref buffer);
}
Could we do something like that?
|
||
using System; | ||
|
||
namespace Microsoft.ML.Runtime.Data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the type show in public surface area of mainline scenario APIs? If yes, should this type be in Microsoft.ML root?
@@ -107,6 +108,16 @@ public static int Size<T>(SortedSet<T> x) | |||
return x == null ? 0 : x.Count; | |||
} | |||
|
|||
public static int Size<T>(Memory<T> x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need these helpers?
@@ -0,0 +1,15 @@ | |||
L1(avg): 1.926877 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these out.txt files should not be here?
After speaking with @TomFinley, this is not the direction we want to go with VBuffer. Closing. |
Working towards dotnet#608
This is an initial draft of what a "dense vector" could look like in ML.NET. I also updated one location to use it to show how it would work.
Please let me know if you think I'm going in the wrong direction.
@TomFinley @ericstj @Zruty0 @KrzysztofCwalina