Skip to content

IDataView Cleanup: Predicates from int to Column #1529

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomFinley opened this issue Nov 5, 2018 · 3 comments
Closed

IDataView Cleanup: Predicates from int to Column #1529

TomFinley opened this issue Nov 5, 2018 · 3 comments
Assignees
Labels
API Issues pertaining the friendly API
Milestone

Comments

@TomFinley
Copy link
Contributor

TomFinley commented Nov 5, 2018

As seen in #1500, schema is being changed so that schemas contain columns.

For various reasons it may be easiest once IDataView is a class.

Let us consider the use of predicates in the IDataView system, e.g., when getting a cursor:

IRowCursor GetRowCursor(Func<int, bool> needCol, IRandom rand = null);

or elsewhere when forming a mapper, and getting dependencies:

Func<int, bool> GetDependencies(Func<int, bool> predicate);

The use of integer indices here has sometimes led to confusion or even bugs. With the change of #1500 under consideration, this suggests a possibly better way.

It may be worth considering whether the columns in the new scheme suggested in #1500 should have a backref to the original schema (even as an internal field that is checked by the data-view abstract class), so as to enable an easy way to check whether that column in fact came from that schema, or, even without that backref, to check whether the columns exist.

We could also consider this dependency be expressed not as a delegate, but instead just some sort of collection of columns, since that would also make this easier to explain.

Note that while this makes the interface to IDataView easier, it makes the implementation harder, at least, if we suppose that all dataviews are possible for handling these inputs correctly and verifying that there aren't any shenaningans going on with input columns being from a different schema (which we can and so almost certainly should do under this new scheme). This suggests a change to IDataView, possibly done once IDataView is a class, so that the utility mapping from these column objects back down to indices (which must still happen internally) is handled by common code. It would also enable if these column objects have some sort of internal backreference to the schema, the ability to check that the input schemas are in fact correct. (This we obviously cannot do today with indices!)

/cc @Zruty0 @shauheen @terrajobst

@eerhardt
Copy link
Member

@TomFinley - does this also apply to these functions?

/// <summary>
/// Returns whether the given column is active in this row.
/// </summary>
public abstract bool IsColumnActive(int col);
/// <summary>
/// Returns a value getter delegate to fetch the given column value from the row.
/// This throws if the column is not active in this row, or if the type
/// <typeparamref name="TValue"/> differs from this column's type.
/// </summary>
public abstract ValueGetter<TValue> GetGetter<TValue>(int col);

If not, and we want to keep these functions taking an int, we should rename the parameters to columnIndex instead of col.

@sfilipi
Copy link
Member

sfilipi commented Feb 27, 2019

@eerhardt thanks for the note. I'll make the change on the PR that will close this issue.

@TomFinley
Copy link
Contributor Author

@TomFinley - does this also apply to these functions?

Hi @eerhardt sorry just saw this. Yes, according to my reasons for the change, this would benefit as well. Actually the very first misuse I ever saw of an IDataView w.r.t. indices confusion was a mistaken fetching of the GetGetter (or, more precisely I suppose, GetColumnValue back before we encapsulated the value retrieval into a delegate).

@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

No branches or pull requests

4 participants