Skip to content

Introduce DenseVector<T> and convert one usage of VBuffer to it. #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

eerhardt
Copy link
Owner

@eerhardt eerhardt commented Oct 8, 2018

Working towards dotnet#608

This is an initial draft of what a "dense vector" could look like in ML.NET. I also updated one location to use it to show how it would work.

Please let me know if you think I'm going in the wrong direction.

@TomFinley @ericstj @Zruty0 @KrzysztofCwalina

/// is passed to a row cursor getter, the callee is free to take ownership of
/// and re-use the backing data structure (Buffer).
/// </summary>
public readonly struct DenseVector<T>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this type? Why not use Memory directly? If you need to be able to resize it, could the caller just hold to the original buffer? Or the method filling in data into Memory would return out parameter specifying how many items were filled in.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at https://github.com/dotnet/machinelearning/blob/master/docs/code/VBufferCareFeeding.md#buffer-re-use-as-a-user for the common usage of these vectors.

Why not use Memory directly?

Because if you only had Memory directly, as soon as you wanted to return a vector that had a length less than the total buffer capacity, now you could no longer get back to use the total capacity of the buffer.

could the caller just hold to the original buffer?

It's my understanding that the caller doesn't always know how big the original buffer should be, and thus can't create it.

Or the method filling in data into Memory would return out parameter specifying how many items were filled in.

That won't work with the current IDataView GetGetter delegate design.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is the following (it's a pattern we use in lots of Span/Memory APIs):

Memory<byte> buffer = ...
while(true){
    if(TryFillBuffer(buffer, out written)) {
        UseFilledInBuffer(buffer.Slice(0, written));
        break;
    }
    Enlarge(ref buffer);  
} 

Could we do something like that?


using System;

namespace Microsoft.ML.Runtime.Data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the type show in public surface area of mainline scenario APIs? If yes, should this type be in Microsoft.ML root?

@@ -107,6 +108,16 @@ public static int Size<T>(SortedSet<T> x)
return x == null ? 0 : x.Count;
}

public static int Size<T>(Memory<T> x)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these helpers?

@@ -0,0 +1,15 @@
L1(avg): 1.926877

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these out.txt files should not be here?

@eerhardt
Copy link
Owner Author

After speaking with @TomFinley, this is not the direction we want to go with VBuffer. Closing.

@eerhardt eerhardt closed this Oct 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants