-
Notifications
You must be signed in to change notification settings - Fork 1.9k
IDataView Cleanup: Predicates from int to Column #1529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@TomFinley - does this also apply to these functions? machinelearning/src/Microsoft.Data.DataView/IDataView.cs Lines 134 to 144 in c90fa51
If not, and we want to keep these functions taking an |
@eerhardt thanks for the note. I'll make the change on the PR that will close this issue. |
Hi @eerhardt sorry just saw this. Yes, according to my reasons for the change, this would benefit as well. Actually the very first misuse I ever saw of an |
As seen in #1500, schema is being changed so that schemas contain columns.
For various reasons it may be easiest once
IDataView
is a class.Let us consider the use of predicates in the
IDataView
system, e.g., when getting a cursor:machinelearning/src/Microsoft.ML.Core/Data/IDataView.cs
Line 103 in f920262
or elsewhere when forming a mapper, and getting dependencies:
machinelearning/src/Microsoft.ML.Core/Data/ISchemaBindableMapper.cs
Line 77 in f920262
The use of integer indices here has sometimes led to confusion or even bugs. With the change of #1500 under consideration, this suggests a possibly better way.
It may be worth considering whether the columns in the new scheme suggested in #1500 should have a backref to the original schema (even as an internal field that is checked by the data-view abstract class), so as to enable an easy way to check whether that column in fact came from that schema, or, even without that backref, to check whether the columns exist.
We could also consider this dependency be expressed not as a delegate, but instead just some sort of collection of columns, since that would also make this easier to explain.
Note that while this makes the interface to
IDataView
easier, it makes the implementation harder, at least, if we suppose that all dataviews are possible for handling these inputs correctly and verifying that there aren't any shenaningans going on with input columns being from a different schema (which we can and so almost certainly should do under this new scheme). This suggests a change toIDataView
, possibly done onceIDataView
is a class, so that the utility mapping from these column objects back down to indices (which must still happen internally) is handled by common code. It would also enable if these column objects have some sort of internal backreference to the schema, the ability to check that the input schemas are in fact correct. (This we obviously cannot do today with indices!)/cc @Zruty0 @shauheen @terrajobst
The text was updated successfully, but these errors were encountered: