Skip to content

Replace predicate with an IEnumerable<DataViewSchema.Column> for IRowToRowMapper.GetRow and ISchemaBoundRowMapper.GetRow #2796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 8, 2019

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Feb 28, 2019

Closes #1529

Note to reviewers: The first commit does all the replacement work.
The second commit, has more files on it, because it deals with renaming the col to columnIndex in the other two methods of IDataView (@eerhardt comment on #1529)
The third commit is me reviewing the first commit, and making sure that GetRow is an explicit implementation everywhere.

I have had most of the work on this PR done before breaking ISchemaBoundRowMapper deriving from IRowToRowMapper; so I kept the signature change of ISchemaBoundRowMapper.GetRow although that interface is internal. LMK if that is not desirable.

@sfilipi sfilipi self-assigned this Feb 28, 2019
@sfilipi sfilipi added the API Issues pertaining the friendly API label Feb 28, 2019

/// <summary>
/// Returns a value getter delegate to fetch the given column value from the row.
/// Returns a value getter delegate to fetch the valueof column with the given columnIndex, from the row.
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valueof [](start = 57, length = 7)

space #Resolved


/// <summary>
/// Returns a value getter delegate to fetch the given column value from the row.
/// Returns a value getter delegate to fetch the valueof column with the given columnIndex, from the row.
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columnIndex [](start = 87, length = 11)

paramref? #Resolved

public override ValueGetter<TValue> GetGetter<TValue>(int col) => _annotations.GetGetter<TValue>(col);

/// <summary>
/// Returns a value getter delegate to fetch the valueof column with the given columnIndex, from the row.
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valueof [](start = 61, length = 7)

space, I guess across all files. #Resolved

Ch.Check(IsColumnActive(col), "The column must be active against the defined predicate.");
if (!(Getters[col] is ValueGetter<TValue>))
Ch.Check(IsColumnActive(columnIndex), "The column must be active against the defined predicate.");
if (!(Getters[columnIndex] is ValueGetter<TValue>))
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ch.CheckParam(IsColumnActive(columnIndex), nameof(columnIndex), "requested column not active");
from transposeLoader.cs
I like it more than some mythical defined predicate #Resolved

Ch.Check(0 <= col && col < Schema.Count, "Column index is out of range");
return Getters[col] != null;
Ch.Check(0 <= columnIndex && columnIndex < Schema.Count, "Column index is out of range");
return Getters[columnIndex] != null;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ch.CheckParam(0 <= columnIndex && columnIndex < _colToActive.Length, nameof(columnIndex));
in files above this one we use this format. consistency would be nice. not a blocker tho. #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not have _colToActive. Leaving it.


In reply to: 261485256 [](ancestors = 261485256)

Ch.Check(_active[col], "column is not active");
var column = _view._columns[col] as Column<TValue>;
Ch.Check(0 <= columnIndex & columnIndex < Schema.Count);
Ch.Check(_active[columnIndex], "column is not active");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"column is not active" [](start = 51, length = 22)

for the sake of consistency: "requested column not active" #Resolved

throw Ch.Except("Column #{0} is requested but not active in the cursor", col);
var getter = _getters[_colToActivesIndex[col]] as ValueGetter<TValue>;
if (!IsColumnActive(columnIndex))
throw Ch.Except("Column #{0} is requested but not active in the cursor", columnIndex);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in how many different ways we can do same thing :)
Ch.CheckParam(IsColumnActive(columnIndex), nameof(columnIndex), "requested column not active");? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in many.


In reply to: 261485672 [](ancestors = 261485672)

@@ -49,18 +49,19 @@ IEnumerable<DataViewSchema.Column> IRowToRowMapper.GetDependencies(IEnumerable<D
return columnsNeeded;
}

public DataViewRow GetRow(DataViewRow input, Func<int, bool> active)
DataViewRow IRowToRowMapper.GetRow(DataViewRow input, IEnumerable<DataViewSchema.Column> activeColumns)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEnumerable<DataViewSchema.Column> activeColumns [](start = 62, length = 48)

It's probably quite late considering how much work you did already for this issue.
So we replace functions to schema columns.
In past we had just indexes and we just check are they belong to schema range or not.
Now we accept schema columns. But we use only index position in them.
Which can lead to situation where i have two dataviews with following schemas:
one is A: int, B: float, and another is C:string, D: bool.
I want to get rows from first dataview. but I made mistake and I pass schema from second dataview as active columns.
I would assume it would work perfectly fine in current state of the code. I'm not sure it should.
#Pending

Copy link
Member Author

@sfilipi sfilipi Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. should switch from tracking activeIndices to active columns at some point. Logged: #2816


In reply to: 261487085 [](ancestors = 261487085)

}
if (predicate(OutputSchema.Count - 1))

if (activeColumns.Select(c => c.Name).Contains(DefaultColumnNames.Probability))
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (activeColumns.Select(c => c.Name).Contains(DefaultColumnNames.Probability)) [](start = 15, length = 80)

I would rather change if inside foreach loop to set bool flag to true if column.Index == OutputSchema.Count-1 instead of continue and check that flag here.
You have mix of two different conditions right now.
And who knows maybe we will allow user to define their own names for probability column in future, so index feels somewhat more secure. #Resolved

@@ -385,9 +385,7 @@ public DataViewRow GetRow(DataViewRow input, Func<int, bool> active)
}

public Func<int, bool> GetGenericPredicate(Func<int, bool> predicate)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetGenericPredicate [](start = 35, length = 19)

is it still needed? #Resolved

var totalColumnsCount = 1 + _outputGenericSchema.Count;
var getters = new Delegate[totalColumnsCount];

if (active(totalColumnsCount - 1))
if (activeColumns.Select(c => c.Index).Contains(totalColumnsCount - 1))
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totalColumnsCount - 1 [](start = 64, length = 21)

this is _outputGenericSchema.Count :) #Resolved

{
Ch.Check(0 <= col & col < _bindings.ColumnCount, "col");
Ch.Check(0 <= columnIndex & columnIndex < _bindings.ColumnCount, "col");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"col" [](start = 77, length = 5)

nameof! #Resolved

{
Func<int, bool> activeInfos =
iinfo =>
{
int col = _bindings.MapIinfoToCol(iinfo);
return active(col);
return activeColumns.Select(c => c.Index).Contains(col);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activeColumns.Select(c => c.Index) [](start = 27, length = 34)

Make a hashset before you define activeInfos and use hashSet.Contains here.
#Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.


In reply to: 261497161 [](ancestors = 261497161)


IRowToRowMapper[] innerMappers = new IRowToRowMapper[0];
if (mapper is CompositeRowToRowMapper compositeMapper)
innerMappers = compositeMapper.InnerMappers;

var activeIndices = activeColumns.Select(c => c.Index);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToArray or HashSet. or Utils.BuildArray #Resolved

{
Func<int, bool> activeInfos =
iinfo =>
{
int col = _bindings.MapIinfoToCol(iinfo);
return active(col);
return activeColumns.Select(c => c.Index).Contains(col);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activeColumns.Select(c => c.Index) [](start = 27, length = 34)

hashset #Resolved

Func<int, bool> activeInfos = iinfo => active(bindings.MapIinfoToCol(iinfo));
GetActive(bindings, activeColumns, out inputColumns, out IEnumerable<DataViewSchema.Column> activeMapperColumns);
var output = bindings.RowMapper.GetRow(input, activeMapperColumns);
Func<int, bool> activeInfos = iinfo => activeColumns.Select(c => c.Index).Contains(bindings.MapIinfoToCol(iinfo));
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activeColumns [](start = 51, length = 13)

this can be quite heavy #Pending

/// </summary>
public abstract bool IsColumnActive(int col);
public abstract bool IsColumnActive(int columnIndex);
Copy link
Member

@eerhardt eerhardt Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were changing these two methods to take in the column instead of the index?

public abstract bool IsColumnActive(DataViewSchema.Column column); #Resolved

Copy link
Member Author

@sfilipi sfilipi Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the predicates were deemed hard to debug. Ideally we'd pass the column and not deal with the indexes but is that a V1 issue? @shauheen @glebuk @TomFinley #Resolved

Copy link
Member

@eerhardt eerhardt Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue being closed here says:

The use of integer indices here has sometimes led to confusion or even bugs.

Which seems to me that we shouldn't have integer indices in our public API. So if we ship V1 like this, we won't be able to remove/change this API. We can add a new virtual method that takes in the Column, but this current API will need to live forever. #Resolved

@@ -1153,7 +1155,7 @@ private static IDataView AddVarLengthColumn<TSrc>(IHostEnvironment env, IDataVie
throw Contracts.Except("Multiple unweighted rows found in metrics data view.");

numResults++;
UpdateSums(isWeightedCol, stratCol, stratVal, agg, numFolds > 1, metricNames, hasWeighted, hasStrats,
UpdateSums(isWeightedColumn.Value.Index, stratCol, stratVal, agg, numFolds > 1, metricNames, hasWeighted, hasStrats,
Copy link
Member Author

@sfilipi sfilipi Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch to isWeightedCol #Resolved

@@ -132,16 +132,18 @@ public abstract class DataViewRow : IDisposable
public abstract ValueGetter<DataViewRowId> GetIdGetter();

/// <summary>
/// Returns whether the given column is active in this row.
/// Returns whether the give column is active in this row.
Copy link
Member Author

@sfilipi sfilipi Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix #Resolved

@sfilipi
Copy link
Member Author

sfilipi commented Mar 5, 2019

    }

try removing this, as it has only one reference. #WontFix


Refers to: src/Microsoft.ML.Core/Utilities/Utils.cs:847 in 7aa80a1. [](commit_id = 7aa80a1, deletion_comment = False)

@@ -523,7 +523,7 @@ private DataViewRowCursor[] SplitCore(IChannelProvider ch, DataViewRowCursor inp
// Create the mappings between active column index, and column index.
int[] activeToCol;
int[] colToActive;
Utils.BuildSubsetMaps(_schema.Count, input.IsColumnActive, out activeToCol, out colToActive);
Utils.BuildSubsetMaps(_schema.Count, _schema, input.IsColumnActive, out activeToCol, out colToActive);
Copy link
Member Author

@sfilipi sfilipi Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_schema.Count [](start = 38, length = 13)

try removing the count, if we're passing schema #Resolved

@sfilipi
Copy link
Member Author

sfilipi commented Mar 5, 2019

    }

it is too involved atm.


In reply to: 469811161 [](ancestors = 469811161)


Refers to: src/Microsoft.ML.Core/Utilities/Utils.cs:847 in 7aa80a1. [](commit_id = 7aa80a1, deletion_comment = False)

/// This throws if the column is not active in this row, or if the type
/// <typeparamref name="TValue"/> differs from this column's type.
/// </summary>
public abstract ValueGetter<TValue> GetGetter<TValue>(int col);
/// <typeparam name="TValue"> is the output column's content type.</typeparam>
Copy link
Member

@eerhardt eerhardt Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Is it confusing to say output column here and below? To the consumer of an IDataView, they are just columns. #Resolved

@@ -50,7 +50,7 @@ public AnnotationsDebuggerProxy(DataViewSchema.Annotations annotations)
private static object GetValue<T>(DataViewSchema.Annotations annotations, int columnIndex)
{
T value = default;
annotations.GetGetter<T>(columnIndex)(ref value);
annotations.GetGetter<T>(annotations.Schema[columnIndex])(ref value);
Copy link
Member

@eerhardt eerhardt Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is only called in one place - which already has the Column. Can we just pass the column through here? #Closed

@@ -41,16 +41,16 @@ public AnnotationsDebuggerProxy(DataViewSchema.Annotations annotations)
foreach (var column in annotations.Schema)
{
var name = column.Name;
var value = Utils.MarshalInvoke(GetValue<int>, column.Type.RawType, annotations, column.Index);
var value = Utils.MarshalInvoke(GetValue<DataViewSchema.Column>, column.Type.RawType, annotations, column);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataViewSchema.Column [](start = 57, length = 21)

Incidentally this was not necessary or helpful. The point is to merely have a method for the method to sink its teeth into, so the type parameter here was not informative. (If anything it is now more obscure.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not essential -- the method will still work the same even with this change, and there should be no cost to it. Just pointing out that there was no need for it.


In reply to: 263101561 [](ancestors = 263101561)

/// so that <c><paramref name="invMap"/>[<paramref name="map"/>[i]] == i</c>,
/// and for other entries not appearing in <paramref name="map"/>,
/// <c><paramref name="invMap"/>[i] == -1</c></param>
public static void BuildSubsetMaps(DataViewSchema schema, Func<DataViewSchema.Column, bool> pred, out int[] map, out int[] invMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Func<DataViewSchema.Column, bool> pred [](start = 66, length = 38)

Hmmm. Is the goal here to enable more elaborate checking I suppose? Totally fine, just interesting.

@TomFinley
Copy link
Contributor

Reviewing this reminds me, we'll have to deal with ITransposeDataView sooner or later. But, maybe later.

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sfilipi !

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

sfilipi added 6 commits March 7, 2019 14:43
…<DataViewSchema.Column> activeColumns` from the IRowToRowMapper.GetRow and the ISchemaBoundRowMapper.GetRow.

renaming col to `columnIndex` in the  `DataViewRow.IsColumnActive` and `DataViewRow.GetGetter<TValue>`
…er and DataView.IsColumnActive

Documentation.
@sfilipi sfilipi force-pushed the GetRowRemovePredicate branch from 230ce75 to 5489b19 Compare March 7, 2019 22:47
@sfilipi sfilipi merged commit 683711a into dotnet:master Mar 8, 2019
@sfilipi sfilipi deleted the GetRowRemovePredicate branch March 8, 2019 06:16
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants