-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Towards 1529: replacing the predicates with an IEnumerable on IRowToRowMapper.GetDependencies #2504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards 1529: replacing the predicates with an IEnumerable on IRowToRowMapper.GetDependencies #2504
Conversation
var predicateOut = GetActiveOutputColumns(active); | ||
|
||
// Now map those to active input columns. | ||
var predicateIn = _mapper.GetDependencies(predicateOut); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call method above #Resolved
for (int i = InnerMappers.Length - 1; i >= 0; --i) | ||
toReturn = InnerMappers[i].GetDependencies(toReturn); | ||
return toReturn; | ||
columnsNeeded = columnsNeeded.Union(InnerMappers[i].GetDependencies(columnsNeeded)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Union [](start = 46, length = 5)
Is this needed? The old code seems to "forget" each intermediate mapper's predicate and only return the dependencies of the first one. #Resolved
Contracts.AssertValue(dependingColumns); | ||
|
||
var active = GetActiveInput(dependingColumns); | ||
Contracts.Assert(active.Count() == Input.Count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call the method above #Resolved
var mapperColumns = Mappers[i].OutputSchema.Where(col => mapperPredicate(col.Index)); | ||
var inputColumns = Mappers[i].GetDependencies(mapperColumns); | ||
|
||
Func<int, bool> inputPredicate = c => BoundPipelines[i].OutputSchema.Count() < c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputPredicate = c => BoundPipelines[i].OutputSchema.Count() < c; [](start = 40, length = 65)
fix #Resolved
var predicateInputForMapper = bindings.RowMapper.GetDependencies(predicateMapper); | ||
// Get the active output columns | ||
var activeOutputCols = bindings.RowMapper.OutputSchema.Where(c => localMapper(c.Index)); | ||
var predicateInputForMapper = bindings.RowMapper.GetDependencies(activeOutputCols); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicate [](start = 16, length = 9)
rename
#Resolved
return col => false; | ||
} | ||
IEnumerable<Schema.Column> IRowToRowMapper.GetDependencies(IEnumerable<Schema.Column> dependingColumns) | ||
=> Enumerable.Repeat(FeatureColumn, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enumerable.Repeat(FeatureColumn, 1); [](start = 14, length = 37)
fix #Resolved
while (transform != null) | ||
{ | ||
var mapper = transform as IRowToRowMapper; | ||
_ectx.AssertValue(mapper); | ||
pred = mapper.GetDependencies(pred); | ||
dependingColumns = dependingColumns.Union(mapper.GetDependencies(cols)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Union [](start = 56, length = 5)
remove #Resolved
@@ -252,13 +256,15 @@ public Row GetRow(Row input, Func<int, bool> active) | |||
var actives = new List<Func<int, bool>>(); | |||
var transform = _chain as IDataTransform; | |||
var activeCur = active; | |||
var activeCurCol = InputSchema.Where(col => active(col.Index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InputSchema [](start = 35, length = 11)
output #Resolved
@@ -252,13 +256,16 @@ public Row GetRow(Row input, Func<int, bool> active) | |||
var actives = new List<Func<int, bool>>(); | |||
var transform = _chain as IDataTransform; | |||
var activeCur = active; | |||
var activeCurCol = OutputSchema.Where(col => active(col.Index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activeCurCol [](start = 20, length = 12)
remove, implement without it. #Resolved
if (dependingColumns.Count() == 0 || !InputRoleMappedSchema.Feature.HasValue) | ||
return Enumerable.Empty<Schema.Column>(); | ||
|
||
return InputSchema.Where(col => col.Name.Equals(InputRoleMappedSchema.Feature?.Name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col => col.Name.Equals(InputRoleMappedSchema.Feature?.Name [](start = 41, length = 58)
base it on the index #Resolved
if (dependingColumns.Count() == 0 || !InputRoleMappedSchema.Feature.HasValue) | ||
return Enumerable.Empty<Schema.Column>(); | ||
|
||
return InputSchema.Where(col => col.Name.Equals(InputRoleMappedSchema.Feature?.Name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col => col.Name.Equals(InputRoleMappedSchema.Feature?.Name [](start = 41, length = 58)
base it on the index #Resolved
deps[i - 1] = InnerMappers[i].GetDependencies(deps[i]); | ||
{ | ||
var outputColumns = InnerMappers[i].OutputSchema.Where(c => deps[i](c.Index)); | ||
var cols = InnerMappers[i].GetDependencies(outputColumns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetDependencies [](start = 43, length = 15)
I would put ToArray to cache it. Otherwise you constantly fetch results from IEnumerable
Well techinacally only twice, one in Count one in Any. #Closed
@@ -245,11 +238,14 @@ void ISaveAsPfa.SaveAsPfa(BoundPfaContext ctx) | |||
} | |||
} | |||
|
|||
public Func<int, bool> GetDependencies(Func<int, bool> predicate) | |||
/// <summary> | |||
/// Given a set of columns, return the input columns that are needed to generate those output columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set of columns [](start = 20, length = 14)
set of output columns? #Closed
} | ||
return col => false; | ||
var columnNames = dependingColumns.Select(col => col.Name); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks excessive. I mean, you don't use it, right? #Closed
return col => col == InputRoleMappedSchema.Feature.Value.Index; | ||
} | ||
return col => false; | ||
if (dependingColumns.Count() == 0 || !InputRoleMappedSchema.Feature.HasValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (dependingColumns.Count() == 0 || !InputRoleMappedSchema.Feature.HasValue) [](start = 16, length = 77)
Micro optimization but i would switch order. #Closed
/// Given a set of columns, return the input columns that are needed to generate those output columns. | ||
/// </summary> | ||
IEnumerable<Schema.Column> IRowToRowMapper.GetDependencies(IEnumerable<Schema.Column> dependingColumns) | ||
=> dependingColumns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look few lines above we have example of different style for =>.
Can we put tab here? (not tab, 4 spaces, we are not barbarians) #Closed
{ | ||
var activeOutput = RowCursorUtils.FromColumnsToPredicate(columns, _mapper.OutputSchema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FromColumnsToPredicate [](start = 50, length = 22)
You will need to untangle it sooner or later :)
Right now I don't see any reason to use predicate. We get set of columns we return set of columns, and we don't call any function which required predicate.
But you can always postpone it to moment when we delete than Utils method #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, that Utils method might stay for a while, because GetActive is not public...
untangling this now:)
In reply to: 256133342 [](ancestors = 256133342)
Codecov Report
@@ Coverage Diff @@
## master #2504 +/- ##
=========================================
Coverage ? 71.67%
=========================================
Files ? 808
Lines ? 142230
Branches ? 16117
=========================================
Hits ? 101937
Misses ? 35857
Partials ? 4436
|
/// Given a set of columns, return the input columns that are needed to generate those output columns. | ||
/// </summary> | ||
IEnumerable<Schema.Column> IRowToRowMapper.GetDependencies(IEnumerable<Schema.Column> dependingColumns) | ||
=> _mapper.GetDependencies(dependingColumns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=> [](start = 15, length = 3)
somehow this triggers me. can you add tab? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
return InputSchema.Where(col => _inputColIndices.Contains(col.Index)); | ||
|
||
//return Enumerable.Repeat(InputSchema.First(col => _inputColIndices.Contains(col.Index)), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//return Enumerable.Repeat(InputSchema.First(col => _inputColIndices.Contains(col.Index)), 1); [](start = 16, length = 94)
clean it #Closed
} | ||
return col => false; | ||
var columnNames = dependingColumns.Select(col => col.Name); | ||
return InputSchema.Where(col => columnNames.Contains(col.Name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InputSchema [](start = 23, length = 11)
was it mistake in previous code? We used to filter by OutputSchema, now it's InputSchema #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description of the function was:
"Given a predicate specifying which columns are needed, return a predicate indicating which input columns are
needed. " So i took it as : input is columns from the outputschema, and return value is columns from the input schema.
i thought the iteration is over the OutputSchema, since the predicate was over the OutputSchema.
hmm, going back to the IRowToRowMapper, summary:
" The domain of the function is defined over the indices of the columns of for ."
but InputSchema => OutputSchema makes no sense, to me?
In reply to: 256136896 [](ancestors = 256136896)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The domain of the function is defined...", I think this refers to the predicate returned by GetDependencies().
The way I understand the old code is - if any columns are active, then activate all the input columns. If this is correct, then the new code should return all the columns of InputSchema if dependingColumns is not empty, and an empty enumerable if dependingColumns is empty.
In reply to: 256140629 [](ancestors = 256140629,256136896)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
int n = _bindings.Schema.Count; | ||
var active = Utils.BuildArray(n, predicate); | ||
Contracts.Assert(active.Length == n); | ||
|
||
var activeInput = _bindings.GetActiveInput(predicate); | ||
Contracts.Assert(activeInput.Length == _bindings.InputSchema.Count); | ||
Contracts.Assert(activeInput.Count() == _bindings.InputSchema.Count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Count [](start = 41, length = 5)
Is activeInput not an array? #Resolved
@@ -164,8 +163,7 @@ private bool[] GetActive(Func<int, bool> predicate, out Func<int, bool> predicat | |||
var predicateIn = _mapper.GetDependencies(predicateOut); | |||
|
|||
// Combine the two sets of input columns. | |||
predicateInput = | |||
col => 0 <= col && col < activeInput.Length && (activeInput[col] || predicateIn(col)); | |||
inputColumns = _bindings.InputSchema.Where(col => activeInput[col.Index]|| predicateIn(col.Index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
]| [](start = 83, length = 2)
Spacing, try ctrl-k-d #Resolved
GetActive(predicate, out predicateInput); | ||
return predicateInput; | ||
var predicate = RowCursorUtils.FromColumnsToPredicate(dependingColumns, OutputSchema); | ||
GetActive(predicate, out IEnumerable<Schema.Column> inputColumns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IEnumerable<Schema.Column> [](start = 37, length = 26)
Don't be afraid of out var
. This will ultimately make @eerhardt's job of renaming this sort of thing easier if you want a selfless reason to do so. #Resolved
@@ -258,7 +262,8 @@ public Row GetRow(Row input, Func<int, bool> active) | |||
_ectx.AssertValue(mapper); | |||
mappers.Add(mapper); | |||
actives.Add(activeCur); | |||
activeCur = mapper.GetDependencies(activeCur); | |||
var activeCurCol = mapper.GetDependencies(mapper.OutputSchema.Where(col => activeCur(col.Index))); | |||
activeCur = c => activeCurCol.Any(col => col.Index == c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activeCurCol.Any(col => col.Index == c); [](start = 37, length = 40)
I don't like this usage of Any
that I've been seeing... using quadratic algorithms is probably best avoided. Did we not have a utility method to take care of this predicate mapping problem? I think we did. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal static Func<int, bool> FromColumnsToPredicate(IEnumerable<Schema.Column> columnsNeeded, Schema sourceSchema)
, you called it, in RowCursorUtils
.
In reply to: 256593471 [](ancestors = 256593471)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8da2253
to
3a7d3da
Compare
if (!InputRoleMappedSchema.Feature.HasValue || dependingColumns.Count() == 0) | ||
return Enumerable.Empty<Schema.Column>(); | ||
|
||
return InputSchema.Where(col => col.Index == InputRoleMappedSchema.Feature.Value.Index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InputSchema.Where(col => col.Index == InputRoleMappedSchema.Feature.Value.Index); [](start = 23, length = 81)
Isn't this InputRoleMappedSchema.Feature? #Resolved
if (dependingColumns.Count() == 0 || !InputRoleMappedSchema.Feature.HasValue) | ||
return Enumerable.Empty<Schema.Column>(); | ||
|
||
return InputSchema.Where(col => col.Index == InputRoleMappedSchema.Feature.Value.Index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InputSchema.Where(col => col.Index == InputRoleMappedSchema.Feature.Value.Index); [](start = 23, length = 81)
Here too. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -589,6 +589,16 @@ public bool[] GetActive(Func<int, bool> predicate) | |||
return Utils.BuildArray(ColumnCount, predicate); | |||
} | |||
|
|||
/// <summary> | |||
/// The given predicate maps from output column index to whether the column is active. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicate [](start = 22, length = 9)
Update the comment. #Resolved
@@ -609,6 +619,19 @@ public bool[] GetActiveInput(Func<int, bool> predicate) | |||
return active; | |||
} | |||
|
|||
/// <summary> | |||
/// The given predicate maps from output column index to whether the column is active. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicate [](start = 22, length = 9)
Update the comment. #Resolved
@@ -763,6 +786,18 @@ public bool[] GetActiveInput(Func<int, bool> predicate) | |||
} | |||
return active; | |||
} | |||
|
|||
/// <summary> | |||
/// The given predicate maps from output column index to whether the column is active. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicate [](start = 22, length = 9)
Update the comment. #Resolved
5849d81
to
95a5518
Compare
…owMapper.GetDependencies
… of scope, and adjusting the score selecting predicate in GetScorers
… IRowToRowMapperBase. Renaming ISchemaBoundRowMapper.GetDependencies to GetDependenciesForNewColumns
@@ -15,7 +16,15 @@ namespace Microsoft.ML.Data | |||
/// so to rebind, the same input column names must be used. | |||
/// Implementations of this interface are typically created over defined input <see cref="DataViewSchema"/>. | |||
/// </summary> | |||
public interface IRowToRowMapper | |||
public interface IRowToRowMapper : IRowToRowMapperBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IRowToRowMapperBase [](start = 39, length = 19)
Can we please not introduce this interface? As far as I see, it exists so that we can have a common "descent" point for IRowToRowMapper
and ISchemaBoundRowMapper
. But the latter is internal and should remain so. I would therefore prefer that there be no relationship between the two. There is no reason for IRowToRowMapperBase
to have a relationship with ISchemaBoundRowMapper
as there is for it to have a relationship with IValueMapper
. (BTW, in spirit, ISchemaBoundRowMapper
is far closer to IValueMapper
). So let's keep our inheritance structure clean. I do not want IRowToRowMapper
to be complicated in this fashion. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So: we'll have separate interfaces, and not pretend that they're somehow related, when in fact they really are not? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created the base interface, to reduce code redundancy. Without it, the ISchemaBoundRowMapper looks just like the IRowToRowMapper.
Rewinding the change.
In reply to: 260019470 [](ancestors = 260019470)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it, but the objection of @yaeldekel is that it doesn't act like it, and that is deliberate and desirable, as far as I can tell. And if it impacts out public surface we have to hold it I think to a higher standard. Anyway, thanks for looking at this.
In reply to: 260026858 [](ancestors = 260026858,260019470)
@@ -58,12 +58,17 @@ internal interface ISchemaBoundMapper | |||
/// This interface combines <see cref="ISchemaBoundMapper"/> with <see cref="IRowToRowMapper"/>. | |||
/// </summary> | |||
[BestFriend] | |||
internal interface ISchemaBoundRowMapper : ISchemaBoundMapper, IRowToRowMapper | |||
internal interface ISchemaBoundRowMapper : ISchemaBoundMapper, IRowToRowMapperBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IRowToRowMapperBase [](start = 67, length = 19)
This is what I feel needs to stop here. We have this descent here, but the output schema meaning is rather different. #Resolved
95a5518
to
383c26f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @sfilipi !
More work towards #1529.
Marked the pr as still working on it, because there is one test failing: TestAndPredictoOnIris; double-checking the changes on the CompositeRowToRowMapper.