-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Replace predicate with an IEnumerable<DataViewSchema.Column> for IRowToRowMapper.GetRow and ISchemaBoundRowMapper.GetRow #2796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
/// <summary> | ||
/// Returns a value getter delegate to fetch the given column value from the row. | ||
/// Returns a value getter delegate to fetch the valueof column with the given columnIndex, from the row. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valueof [](start = 57, length = 7)
space #Resolved
|
||
/// <summary> | ||
/// Returns a value getter delegate to fetch the given column value from the row. | ||
/// Returns a value getter delegate to fetch the valueof column with the given columnIndex, from the row. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
columnIndex [](start = 87, length = 11)
paramref? #Resolved
public override ValueGetter<TValue> GetGetter<TValue>(int col) => _annotations.GetGetter<TValue>(col); | ||
|
||
/// <summary> | ||
/// Returns a value getter delegate to fetch the valueof column with the given columnIndex, from the row. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valueof [](start = 61, length = 7)
space, I guess across all files. #Resolved
Ch.Check(IsColumnActive(col), "The column must be active against the defined predicate."); | ||
if (!(Getters[col] is ValueGetter<TValue>)) | ||
Ch.Check(IsColumnActive(columnIndex), "The column must be active against the defined predicate."); | ||
if (!(Getters[columnIndex] is ValueGetter<TValue>)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ch.CheckParam(IsColumnActive(columnIndex), nameof(columnIndex), "requested column not active");
from transposeLoader.cs
I like it more than some mythical defined predicate
#Resolved
Ch.Check(0 <= col && col < Schema.Count, "Column index is out of range"); | ||
return Getters[col] != null; | ||
Ch.Check(0 <= columnIndex && columnIndex < Schema.Count, "Column index is out of range"); | ||
return Getters[columnIndex] != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ch.CheckParam(0 <= columnIndex && columnIndex < _colToActive.Length, nameof(columnIndex));
in files above this one we use this format. consistency would be nice. not a blocker tho. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ch.Check(_active[col], "column is not active"); | ||
var column = _view._columns[col] as Column<TValue>; | ||
Ch.Check(0 <= columnIndex & columnIndex < Schema.Count); | ||
Ch.Check(_active[columnIndex], "column is not active"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"column is not active" [](start = 51, length = 22)
for the sake of consistency: "requested column not active" #Resolved
throw Ch.Except("Column #{0} is requested but not active in the cursor", col); | ||
var getter = _getters[_colToActivesIndex[col]] as ValueGetter<TValue>; | ||
if (!IsColumnActive(columnIndex)) | ||
throw Ch.Except("Column #{0} is requested but not active in the cursor", columnIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in how many different ways we can do same thing :)
Ch.CheckParam(IsColumnActive(columnIndex), nameof(columnIndex), "requested column not active");
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -49,18 +49,19 @@ IEnumerable<DataViewSchema.Column> IRowToRowMapper.GetDependencies(IEnumerable<D | |||
return columnsNeeded; | |||
} | |||
|
|||
public DataViewRow GetRow(DataViewRow input, Func<int, bool> active) | |||
DataViewRow IRowToRowMapper.GetRow(DataViewRow input, IEnumerable<DataViewSchema.Column> activeColumns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IEnumerable<DataViewSchema.Column> activeColumns [](start = 62, length = 48)
It's probably quite late considering how much work you did already for this issue.
So we replace functions to schema columns.
In past we had just indexes and we just check are they belong to schema range or not.
Now we accept schema columns. But we use only index position in them.
Which can lead to situation where i have two dataviews with following schemas:
one is A: int, B: float
, and another is C:string, D: bool
.
I want to get rows from first dataview. but I made mistake and I pass schema from second dataview as active columns.
I would assume it would work perfectly fine in current state of the code. I'm not sure it should.
#Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
if (predicate(OutputSchema.Count - 1)) | ||
|
||
if (activeColumns.Select(c => c.Name).Contains(DefaultColumnNames.Probability)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (activeColumns.Select(c => c.Name).Contains(DefaultColumnNames.Probability)) [](start = 15, length = 80)
I would rather change if inside foreach loop to set bool flag to true if column.Index == OutputSchema.Count-1 instead of continue and check that flag here.
You have mix of two different conditions right now.
And who knows maybe we will allow user to define their own names for probability column in future, so index feels somewhat more secure. #Resolved
@@ -385,9 +385,7 @@ public DataViewRow GetRow(DataViewRow input, Func<int, bool> active) | |||
} | |||
|
|||
public Func<int, bool> GetGenericPredicate(Func<int, bool> predicate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetGenericPredicate [](start = 35, length = 19)
is it still needed? #Resolved
var totalColumnsCount = 1 + _outputGenericSchema.Count; | ||
var getters = new Delegate[totalColumnsCount]; | ||
|
||
if (active(totalColumnsCount - 1)) | ||
if (activeColumns.Select(c => c.Index).Contains(totalColumnsCount - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totalColumnsCount - 1 [](start = 64, length = 21)
this is _outputGenericSchema.Count :) #Resolved
{ | ||
Ch.Check(0 <= col & col < _bindings.ColumnCount, "col"); | ||
Ch.Check(0 <= columnIndex & columnIndex < _bindings.ColumnCount, "col"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"col" [](start = 77, length = 5)
nameof! #Resolved
{ | ||
Func<int, bool> activeInfos = | ||
iinfo => | ||
{ | ||
int col = _bindings.MapIinfoToCol(iinfo); | ||
return active(col); | ||
return activeColumns.Select(c => c.Index).Contains(col); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activeColumns.Select(c => c.Index) [](start = 27, length = 34)
Make a hashset before you define activeInfos and use hashSet.Contains here.
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
IRowToRowMapper[] innerMappers = new IRowToRowMapper[0]; | ||
if (mapper is CompositeRowToRowMapper compositeMapper) | ||
innerMappers = compositeMapper.InnerMappers; | ||
|
||
var activeIndices = activeColumns.Select(c => c.Index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToArray or HashSet. or Utils.BuildArray #Resolved
{ | ||
Func<int, bool> activeInfos = | ||
iinfo => | ||
{ | ||
int col = _bindings.MapIinfoToCol(iinfo); | ||
return active(col); | ||
return activeColumns.Select(c => c.Index).Contains(col); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activeColumns.Select(c => c.Index) [](start = 27, length = 34)
hashset #Resolved
Func<int, bool> activeInfos = iinfo => active(bindings.MapIinfoToCol(iinfo)); | ||
GetActive(bindings, activeColumns, out inputColumns, out IEnumerable<DataViewSchema.Column> activeMapperColumns); | ||
var output = bindings.RowMapper.GetRow(input, activeMapperColumns); | ||
Func<int, bool> activeInfos = iinfo => activeColumns.Select(c => c.Index).Contains(bindings.MapIinfoToCol(iinfo)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activeColumns [](start = 51, length = 13)
this can be quite heavy #Pending
/// </summary> | ||
public abstract bool IsColumnActive(int col); | ||
public abstract bool IsColumnActive(int columnIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were changing these two methods to take in the column instead of the index?
public abstract bool IsColumnActive(DataViewSchema.Column column);
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the predicates were deemed hard to debug. Ideally we'd pass the column and not deal with the indexes but is that a V1 issue? @shauheen @glebuk @TomFinley #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue being closed here says:
The use of integer indices here has sometimes led to confusion or even bugs.
Which seems to me that we shouldn't have integer indices in our public API. So if we ship V1 like this, we won't be able to remove/change this API. We can add a new virtual
method that takes in the Column
, but this current API will need to live forever. #Resolved
0ccbde7
to
7aa80a1
Compare
@@ -1153,7 +1155,7 @@ private static IDataView AddVarLengthColumn<TSrc>(IHostEnvironment env, IDataVie | |||
throw Contracts.Except("Multiple unweighted rows found in metrics data view."); | |||
|
|||
numResults++; | |||
UpdateSums(isWeightedCol, stratCol, stratVal, agg, numFolds > 1, metricNames, hasWeighted, hasStrats, | |||
UpdateSums(isWeightedColumn.Value.Index, stratCol, stratVal, agg, numFolds > 1, metricNames, hasWeighted, hasStrats, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch to isWeightedCol #Resolved
@@ -132,16 +132,18 @@ public abstract class DataViewRow : IDisposable | |||
public abstract ValueGetter<DataViewRowId> GetIdGetter(); | |||
|
|||
/// <summary> | |||
/// Returns whether the given column is active in this row. | |||
/// Returns whether the give column is active in this row. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix #Resolved
@@ -523,7 +523,7 @@ private DataViewRowCursor[] SplitCore(IChannelProvider ch, DataViewRowCursor inp | |||
// Create the mappings between active column index, and column index. | |||
int[] activeToCol; | |||
int[] colToActive; | |||
Utils.BuildSubsetMaps(_schema.Count, input.IsColumnActive, out activeToCol, out colToActive); | |||
Utils.BuildSubsetMaps(_schema.Count, _schema, input.IsColumnActive, out activeToCol, out colToActive); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_schema.Count [](start = 38, length = 13)
try removing the count, if we're passing schema #Resolved
/// This throws if the column is not active in this row, or if the type | ||
/// <typeparamref name="TValue"/> differs from this column's type. | ||
/// </summary> | ||
public abstract ValueGetter<TValue> GetGetter<TValue>(int col); | ||
/// <typeparam name="TValue"> is the output column's content type.</typeparam> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Is it confusing to say output column
here and below? To the consumer of an IDataView, they are just column
s. #Resolved
@@ -50,7 +50,7 @@ public AnnotationsDebuggerProxy(DataViewSchema.Annotations annotations) | |||
private static object GetValue<T>(DataViewSchema.Annotations annotations, int columnIndex) | |||
{ | |||
T value = default; | |||
annotations.GetGetter<T>(columnIndex)(ref value); | |||
annotations.GetGetter<T>(annotations.Schema[columnIndex])(ref value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is only called in one place - which already has the Column. Can we just pass the column through here? #Closed
@@ -41,16 +41,16 @@ public AnnotationsDebuggerProxy(DataViewSchema.Annotations annotations) | |||
foreach (var column in annotations.Schema) | |||
{ | |||
var name = column.Name; | |||
var value = Utils.MarshalInvoke(GetValue<int>, column.Type.RawType, annotations, column.Index); | |||
var value = Utils.MarshalInvoke(GetValue<DataViewSchema.Column>, column.Type.RawType, annotations, column); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataViewSchema.Column [](start = 57, length = 21)
Incidentally this was not necessary or helpful. The point is to merely have a method for the method to sink its teeth into, so the type parameter here was not informative. (If anything it is now more obscure.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not essential -- the method will still work the same even with this change, and there should be no cost to it. Just pointing out that there was no need for it.
In reply to: 263101561 [](ancestors = 263101561)
/// so that <c><paramref name="invMap"/>[<paramref name="map"/>[i]] == i</c>, | ||
/// and for other entries not appearing in <paramref name="map"/>, | ||
/// <c><paramref name="invMap"/>[i] == -1</c></param> | ||
public static void BuildSubsetMaps(DataViewSchema schema, Func<DataViewSchema.Column, bool> pred, out int[] map, out int[] invMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Func<DataViewSchema.Column, bool> pred [](start = 66, length = 38)
Hmmm. Is the goal here to enable more elaborate checking I suppose? Totally fine, just interesting.
Reviewing this reminds me, we'll have to deal with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @sfilipi !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…<DataViewSchema.Column> activeColumns` from the IRowToRowMapper.GetRow and the ISchemaBoundRowMapper.GetRow. renaming col to `columnIndex` in the `DataViewRow.IsColumnActive` and `DataViewRow.GetGetter<TValue>`
…er and DataView.IsColumnActive Documentation.
… take a column, rather than column index.
…lumn instead of the column index.
230ce75
to
5489b19
Compare
Closes #1529
Note to reviewers: The first commit does all the replacement work.
The second commit, has more files on it, because it deals with renaming the col to columnIndex in the other two methods of IDataView (@eerhardt comment on #1529)
The third commit is me reviewing the first commit, and making sure that GetRow is an explicit implementation everywhere.
I have had most of the work on this PR done before breaking ISchemaBoundRowMapper deriving from IRowToRowMapper; so I kept the signature change of ISchemaBoundRowMapper.GetRow although that interface is internal. LMK if that is not desirable.