Skip to content

IDataView Cleanup: Predicates from int to Column #1529

Closed
@TomFinley

Description

@TomFinley

As seen in #1500, schema is being changed so that schemas contain columns.

For various reasons it may be easiest once IDataView is a class.

Let us consider the use of predicates in the IDataView system, e.g., when getting a cursor:

IRowCursor GetRowCursor(Func<int, bool> needCol, IRandom rand = null);

or elsewhere when forming a mapper, and getting dependencies:

Func<int, bool> GetDependencies(Func<int, bool> predicate);

The use of integer indices here has sometimes led to confusion or even bugs. With the change of #1500 under consideration, this suggests a possibly better way.

It may be worth considering whether the columns in the new scheme suggested in #1500 should have a backref to the original schema (even as an internal field that is checked by the data-view abstract class), so as to enable an easy way to check whether that column in fact came from that schema, or, even without that backref, to check whether the columns exist.

We could also consider this dependency be expressed not as a delegate, but instead just some sort of collection of columns, since that would also make this easier to explain.

Note that while this makes the interface to IDataView easier, it makes the implementation harder, at least, if we suppose that all dataviews are possible for handling these inputs correctly and verifying that there aren't any shenaningans going on with input columns being from a different schema (which we can and so almost certainly should do under this new scheme). This suggests a change to IDataView, possibly done once IDataView is a class, so that the utility mapping from these column objects back down to indices (which must still happen internally) is handled by common code. It would also enable if these column objects have some sort of internal backreference to the schema, the ability to check that the input schemas are in fact correct. (This we obviously cannot do today with indices!)

/cc @Zruty0 @shauheen @terrajobst

Metadata

Metadata

Assignees

Labels

APIIssues pertaining the friendly API

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions