Skip to content

DataFrame - add support for vbuffer #5872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #6144
LittleLittleCloud opened this issue Jul 7, 2021 · 6 comments
Open
Tracked by #6144

DataFrame - add support for vbuffer #5872

LittleLittleCloud opened this issue Jul 7, 2021 · 6 comments
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs
Milestone

Comments

@LittleLittleCloud
Copy link
Contributor

It seems that dataframe API still doesn't support vbuffer, in which case if there's vbuffer type in IDataView, ToDataFrame() will fail.

@LittleLittleCloud LittleLittleCloud added the Microsoft.Data.Analysis All DataFrame related issues and PRs label Jul 7, 2021
@lqdev
Copy link

lqdev commented Aug 11, 2021

To give an example. For the following dataset

5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa

Using the following IDataView schema

type ModelInput = {
    [<LoadColumn(0,3)>] Features: float32 array
    [<LoadColumn(4)>] Label: string
}

Throws the following error when ToDataFrame is called.

System.NotSupportedException: VBuffer`1 is not a supported column type.

@LittleLittleCloud
Copy link
Contributor Author

@ericstj @eerhardt @michaelgsharp
What it takes to add support for VBuffer or more generically: Object in DataFrame API, any roadmap for that

@michaelgsharp
Copy link
Contributor

This is going to need some further investigation to see what it would take. Its in our roadmap but we will be taking a look at it after we get TorchSharp resolved.

@michaelgsharp michaelgsharp added this to the ML.NET Future milestone Mar 2, 2022
@ericstj
Copy link
Member

ericstj commented Mar 2, 2022

@eerhardt have you thought of this before or are you aware of any discussion with Prasanth about it? If we have an idea of how it would work we could write that up here in case someone else might be interested in helping fix this.

@eerhardt
Copy link
Member

eerhardt commented Mar 2, 2022

I really haven't given it deep thought. I know it is a problem, but I'm not sure how exactly to structure a DataFrameColumn that contains VBuffer instances. They are a little bit at odds, since VBuffer is supposed to be a "buffer" that changes as you "cursor" over the rows of an IDataView. Whereas DataFrame wants everything to be loaded at once in memory. But maybe we can have a column derived from DataFrameColumn that contains a distinct VBuffer for every row in the DataFrame.

That's about as far as I've gone thinking of this.

See also #5721

@jeffhandley
Copy link
Member

@michaelgsharp or @JakeRadMSFT -- Was this completed with #6409, or is there more to do here still?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs
Projects
None yet
Development

No branches or pull requests

6 participants