Skip to content

Towards 1529: replacing the predicates with an IEnumerable on IRowToRowMapper.GetDependencies #2504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 26, 2019

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Feb 11, 2019

More work towards #1529.

Marked the pr as still working on it, because there is one test failing: TestAndPredictoOnIris; double-checking the changes on the CompositeRowToRowMapper.

var predicateOut = GetActiveOutputColumns(active);

// Now map those to active input columns.
var predicateIn = _mapper.GetDependencies(predicateOut);
Copy link
Member Author

@sfilipi sfilipi Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call method above #Resolved

for (int i = InnerMappers.Length - 1; i >= 0; --i)
toReturn = InnerMappers[i].GetDependencies(toReturn);
return toReturn;
columnsNeeded = columnsNeeded.Union(InnerMappers[i].GetDependencies(columnsNeeded));
Copy link

@yaeldekel yaeldekel Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union [](start = 46, length = 5)

Is this needed? The old code seems to "forget" each intermediate mapper's predicate and only return the dependencies of the first one. #Resolved

@sfilipi sfilipi self-assigned this Feb 12, 2019
@sfilipi sfilipi added the API Issues pertaining the friendly API label Feb 12, 2019
@sfilipi sfilipi added this to the 0219 milestone Feb 12, 2019
Contracts.AssertValue(dependingColumns);

var active = GetActiveInput(dependingColumns);
Contracts.Assert(active.Count() == Input.Count);
Copy link
Member Author

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call the method above #Resolved

var mapperColumns = Mappers[i].OutputSchema.Where(col => mapperPredicate(col.Index));
var inputColumns = Mappers[i].GetDependencies(mapperColumns);

Func<int, bool> inputPredicate = c => BoundPipelines[i].OutputSchema.Count() < c;
Copy link
Member Author

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputPredicate = c => BoundPipelines[i].OutputSchema.Count() < c; [](start = 40, length = 65)

fix #Resolved

@sfilipi
Copy link
Member Author

sfilipi commented Feb 12, 2019

        }

remove #Resolved


Refers to: src/Microsoft.ML.Data/Transforms/ColumnSelecting.cs:709 in 5b99fe2. [](commit_id = 5b99fe2, deletion_comment = False)

var predicateInputForMapper = bindings.RowMapper.GetDependencies(predicateMapper);
// Get the active output columns
var activeOutputCols = bindings.RowMapper.OutputSchema.Where(c => localMapper(c.Index));
var predicateInputForMapper = bindings.RowMapper.GetDependencies(activeOutputCols);
Copy link
Member Author

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicate [](start = 16, length = 9)

rename
#Resolved

return col => false;
}
IEnumerable<Schema.Column> IRowToRowMapper.GetDependencies(IEnumerable<Schema.Column> dependingColumns)
=> Enumerable.Repeat(FeatureColumn, 1);
Copy link
Member Author

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enumerable.Repeat(FeatureColumn, 1); [](start = 14, length = 37)

fix #Resolved

while (transform != null)
{
var mapper = transform as IRowToRowMapper;
_ectx.AssertValue(mapper);
pred = mapper.GetDependencies(pred);
dependingColumns = dependingColumns.Union(mapper.GetDependencies(cols));
Copy link
Member Author

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union [](start = 56, length = 5)

remove #Resolved

@@ -252,13 +256,15 @@ public Row GetRow(Row input, Func<int, bool> active)
var actives = new List<Func<int, bool>>();
var transform = _chain as IDataTransform;
var activeCur = active;
var activeCurCol = InputSchema.Where(col => active(col.Index));
Copy link
Member Author

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InputSchema [](start = 35, length = 11)

output #Resolved

@@ -252,13 +256,16 @@ public Row GetRow(Row input, Func<int, bool> active)
var actives = new List<Func<int, bool>>();
var transform = _chain as IDataTransform;
var activeCur = active;
var activeCurCol = OutputSchema.Where(col => active(col.Index));
Copy link
Member Author

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activeCurCol [](start = 20, length = 12)

remove, implement without it. #Resolved

if (dependingColumns.Count() == 0 || !InputRoleMappedSchema.Feature.HasValue)
return Enumerable.Empty<Schema.Column>();

return InputSchema.Where(col => col.Name.Equals(InputRoleMappedSchema.Feature?.Name));
Copy link
Member Author

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

col => col.Name.Equals(InputRoleMappedSchema.Feature?.Name [](start = 41, length = 58)

base it on the index #Resolved

if (dependingColumns.Count() == 0 || !InputRoleMappedSchema.Feature.HasValue)
return Enumerable.Empty<Schema.Column>();

return InputSchema.Where(col => col.Name.Equals(InputRoleMappedSchema.Feature?.Name));
Copy link
Member Author

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

col => col.Name.Equals(InputRoleMappedSchema.Feature?.Name [](start = 41, length = 58)

base it on the index #Resolved

@sfilipi sfilipi changed the title [WIP] towards 1529: replacing the predicates with an IEnumerable on IRowToRowMapper.GetDependencies Towards 1529: replacing the predicates with an IEnumerable on IRowToRowMapper.GetDependencies Feb 12, 2019
deps[i - 1] = InnerMappers[i].GetDependencies(deps[i]);
{
var outputColumns = InnerMappers[i].OutputSchema.Where(c => deps[i](c.Index));
var cols = InnerMappers[i].GetDependencies(outputColumns);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetDependencies [](start = 43, length = 15)

I would put ToArray to cache it. Otherwise you constantly fetch results from IEnumerable
Well techinacally only twice, one in Count one in Any. #Closed

@@ -245,11 +238,14 @@ void ISaveAsPfa.SaveAsPfa(BoundPfaContext ctx)
}
}

public Func<int, bool> GetDependencies(Func<int, bool> predicate)
/// <summary>
/// Given a set of columns, return the input columns that are needed to generate those output columns.
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set of columns [](start = 20, length = 14)

set of output columns? #Closed

}
return col => false;
var columnNames = dependingColumns.Select(col => col.Name);

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks excessive. I mean, you don't use it, right? #Closed

return col => col == InputRoleMappedSchema.Feature.Value.Index;
}
return col => false;
if (dependingColumns.Count() == 0 || !InputRoleMappedSchema.Feature.HasValue)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (dependingColumns.Count() == 0 || !InputRoleMappedSchema.Feature.HasValue) [](start = 16, length = 77)

Micro optimization but i would switch order. #Closed

/// Given a set of columns, return the input columns that are needed to generate those output columns.
/// </summary>
IEnumerable<Schema.Column> IRowToRowMapper.GetDependencies(IEnumerable<Schema.Column> dependingColumns)
=> dependingColumns;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look few lines above we have example of different style for =>.
Can we put tab here? (not tab, 4 spaces, we are not barbarians) #Closed

{
var activeOutput = RowCursorUtils.FromColumnsToPredicate(columns, _mapper.OutputSchema);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FromColumnsToPredicate [](start = 50, length = 22)

You will need to untangle it sooner or later :)
Right now I don't see any reason to use predicate. We get set of columns we return set of columns, and we don't call any function which required predicate.
But you can always postpone it to moment when we delete than Utils method #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, that Utils method might stay for a while, because GetActive is not public...

untangling this now:)


In reply to: 256133342 [](ancestors = 256133342)

@codecov
Copy link

codecov bot commented Feb 12, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@f6d55f3). Click here to learn what that means.
The diff coverage is 77.77%.

@@            Coverage Diff            @@
##             master    #2504   +/-   ##
=========================================
  Coverage          ?   71.67%           
=========================================
  Files             ?      808           
  Lines             ?   142230           
  Branches          ?    16117           
=========================================
  Hits              ?   101937           
  Misses            ?    35857           
  Partials          ?     4436
Flag Coverage Δ
#Debug 71.67% <77.77%> (?)
#production 67.92% <77.77%> (?)
#test 85.85% <ø> (?)
Impacted Files Coverage Δ
...st/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs 100% <ø> (ø)
...s/Scenarios/Api/CookbookSamples/CookbookSamples.cs 90.9% <ø> (ø)
.../Scenarios/Api/Estimators/SimpleTrainAndPredict.cs 94.73% <ø> (ø)
src/Microsoft.ML.Data/Transforms/NopTransform.cs 17.64% <0%> (ø)
...Microsoft.ML.Transforms/OptionalColumnTransform.cs 27.77% <0%> (ø)
...rosoft.ML.Data/DataView/RowToRowMapperTransform.cs 92.3% <100%> (ø)
...soft.ML.Data/Scorers/MultiClassClassifierScorer.cs 59.64% <100%> (ø)
....ML.Data/Scorers/FeatureContributionCalculation.cs 94.32% <100%> (ø)
...rc/Microsoft.ML.Data/Transforms/ColumnSelecting.cs 97.32% <100%> (ø)
src/Microsoft.ML.TimeSeries/PredictionFunction.cs 87.5% <100%> (ø)
... and 12 more

/// Given a set of columns, return the input columns that are needed to generate those output columns.
/// </summary>
IEnumerable<Schema.Column> IRowToRowMapper.GetDependencies(IEnumerable<Schema.Column> dependingColumns)
=> _mapper.GetDependencies(dependingColumns);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> [](start = 15, length = 3)

somehow this triggers me. can you add tab? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add 4 spaces ;)


In reply to: 256133529 [](ancestors = 256133529)


return InputSchema.Where(col => _inputColIndices.Contains(col.Index));

//return Enumerable.Repeat(InputSchema.First(col => _inputColIndices.Contains(col.Index)), 1);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//return Enumerable.Repeat(InputSchema.First(col => _inputColIndices.Contains(col.Index)), 1); [](start = 16, length = 94)

clean it #Closed

}
return col => false;
var columnNames = dependingColumns.Select(col => col.Name);
return InputSchema.Where(col => columnNames.Contains(col.Name));
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InputSchema [](start = 23, length = 11)

was it mistake in previous code? We used to filter by OutputSchema, now it's InputSchema #Closed

Copy link
Member Author

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of the function was:

"Given a predicate specifying which columns are needed, return a predicate indicating which input columns are
needed. " So i took it as : input is columns from the outputschema, and return value is columns from the input schema.

i thought the iteration is over the OutputSchema, since the predicate was over the OutputSchema.

hmm, going back to the IRowToRowMapper, summary:

" The domain of the function is defined over the indices of the columns of for ."

but InputSchema => OutputSchema makes no sense, to me?


In reply to: 256136896 [](ancestors = 256136896)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The domain of the function is defined...", I think this refers to the predicate returned by GetDependencies().

The way I understand the old code is - if any columns are active, then activate all the input columns. If this is correct, then the new code should return all the columns of InputSchema if dependingColumns is not empty, and an empty enumerable if dependingColumns is empty.


In reply to: 256140629 [](ancestors = 256140629,256136896)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @yaeldekel


In reply to: 256199343 [](ancestors = 256199343,256140629,256136896)

{
int n = _bindings.Schema.Count;
var active = Utils.BuildArray(n, predicate);
Contracts.Assert(active.Length == n);

var activeInput = _bindings.GetActiveInput(predicate);
Contracts.Assert(activeInput.Length == _bindings.InputSchema.Count);
Contracts.Assert(activeInput.Count() == _bindings.InputSchema.Count);
Copy link

@yaeldekel yaeldekel Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Count [](start = 41, length = 5)

Is activeInput not an array? #Resolved

@@ -164,8 +163,7 @@ private bool[] GetActive(Func<int, bool> predicate, out Func<int, bool> predicat
var predicateIn = _mapper.GetDependencies(predicateOut);

// Combine the two sets of input columns.
predicateInput =
col => 0 <= col && col < activeInput.Length && (activeInput[col] || predicateIn(col));
inputColumns = _bindings.InputSchema.Where(col => activeInput[col.Index]|| predicateIn(col.Index));
Copy link
Contributor

@TomFinley TomFinley Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

]| [](start = 83, length = 2)

Spacing, try ctrl-k-d #Resolved

GetActive(predicate, out predicateInput);
return predicateInput;
var predicate = RowCursorUtils.FromColumnsToPredicate(dependingColumns, OutputSchema);
GetActive(predicate, out IEnumerable<Schema.Column> inputColumns);
Copy link
Contributor

@TomFinley TomFinley Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEnumerable<Schema.Column> [](start = 37, length = 26)

Don't be afraid of out var. This will ultimately make @eerhardt's job of renaming this sort of thing easier if you want a selfless reason to do so. #Resolved

@@ -258,7 +262,8 @@ public Row GetRow(Row input, Func<int, bool> active)
_ectx.AssertValue(mapper);
mappers.Add(mapper);
actives.Add(activeCur);
activeCur = mapper.GetDependencies(activeCur);
var activeCurCol = mapper.GetDependencies(mapper.OutputSchema.Where(col => activeCur(col.Index)));
activeCur = c => activeCurCol.Any(col => col.Index == c);
Copy link
Contributor

@TomFinley TomFinley Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activeCurCol.Any(col => col.Index == c); [](start = 37, length = 40)

I don't like this usage of Any that I've been seeing... using quadratic algorithms is probably best avoided. Did we not have a utility method to take care of this predicate mapping problem? I think we did. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal static Func<int, bool> FromColumnsToPredicate(IEnumerable<Schema.Column> columnsNeeded, Schema sourceSchema), you called it, in RowCursorUtils.


In reply to: 256593471 [](ancestors = 256593471)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi sfilipi force-pushed the getDependanciesRemovePredicates branch from 8da2253 to 3a7d3da Compare February 13, 2019 22:20
if (!InputRoleMappedSchema.Feature.HasValue || dependingColumns.Count() == 0)
return Enumerable.Empty<Schema.Column>();

return InputSchema.Where(col => col.Index == InputRoleMappedSchema.Feature.Value.Index);
Copy link

@yaeldekel yaeldekel Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InputSchema.Where(col => col.Index == InputRoleMappedSchema.Feature.Value.Index); [](start = 23, length = 81)

Isn't this InputRoleMappedSchema.Feature? #Resolved

if (dependingColumns.Count() == 0 || !InputRoleMappedSchema.Feature.HasValue)
return Enumerable.Empty<Schema.Column>();

return InputSchema.Where(col => col.Index == InputRoleMappedSchema.Feature.Value.Index);
Copy link

@yaeldekel yaeldekel Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InputSchema.Where(col => col.Index == InputRoleMappedSchema.Feature.Value.Index); [](start = 23, length = 81)

Here too. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duh!
thanks :)


In reply to: 256620362 [](ancestors = 256620362)

@@ -589,6 +589,16 @@ public bool[] GetActive(Func<int, bool> predicate)
return Utils.BuildArray(ColumnCount, predicate);
}

/// <summary>
/// The given predicate maps from output column index to whether the column is active.
Copy link

@yaeldekel yaeldekel Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicate [](start = 22, length = 9)

Update the comment. #Resolved

@@ -609,6 +619,19 @@ public bool[] GetActiveInput(Func<int, bool> predicate)
return active;
}

/// <summary>
/// The given predicate maps from output column index to whether the column is active.
Copy link

@yaeldekel yaeldekel Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicate [](start = 22, length = 9)

Update the comment. #Resolved

@@ -763,6 +786,18 @@ public bool[] GetActiveInput(Func<int, bool> predicate)
}
return active;
}

/// <summary>
/// The given predicate maps from output column index to whether the column is active.
Copy link

@yaeldekel yaeldekel Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicate [](start = 22, length = 9)

Update the comment. #Resolved

@sfilipi sfilipi force-pushed the getDependanciesRemovePredicates branch from 5849d81 to 95a5518 Compare February 25, 2019 19:55
@@ -15,7 +16,15 @@ namespace Microsoft.ML.Data
/// so to rebind, the same input column names must be used.
/// Implementations of this interface are typically created over defined input <see cref="DataViewSchema"/>.
/// </summary>
public interface IRowToRowMapper
public interface IRowToRowMapper : IRowToRowMapperBase
Copy link
Contributor

@TomFinley TomFinley Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IRowToRowMapperBase [](start = 39, length = 19)

Can we please not introduce this interface? As far as I see, it exists so that we can have a common "descent" point for IRowToRowMapper and ISchemaBoundRowMapper. But the latter is internal and should remain so. I would therefore prefer that there be no relationship between the two. There is no reason for IRowToRowMapperBase to have a relationship with ISchemaBoundRowMapper as there is for it to have a relationship with IValueMapper. (BTW, in spirit, ISchemaBoundRowMapper is far closer to IValueMapper). So let's keep our inheritance structure clean. I do not want IRowToRowMapper to be complicated in this fashion. #Resolved

Copy link
Contributor

@TomFinley TomFinley Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So: we'll have separate interfaces, and not pretend that they're somehow related, when in fact they really are not? #Resolved

Copy link
Member Author

@sfilipi sfilipi Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created the base interface, to reduce code redundancy. Without it, the ISchemaBoundRowMapper looks just like the IRowToRowMapper.

Rewinding the change.


In reply to: 260019470 [](ancestors = 260019470)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it, but the objection of @yaeldekel is that it doesn't act like it, and that is deliberate and desirable, as far as I can tell. And if it impacts out public surface we have to hold it I think to a higher standard. Anyway, thanks for looking at this.


In reply to: 260026858 [](ancestors = 260026858,260019470)

@@ -58,12 +58,17 @@ internal interface ISchemaBoundMapper
/// This interface combines <see cref="ISchemaBoundMapper"/> with <see cref="IRowToRowMapper"/>.
/// </summary>
[BestFriend]
internal interface ISchemaBoundRowMapper : ISchemaBoundMapper, IRowToRowMapper
internal interface ISchemaBoundRowMapper : ISchemaBoundMapper, IRowToRowMapperBase
Copy link
Contributor

@TomFinley TomFinley Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IRowToRowMapperBase [](start = 67, length = 19)

This is what I feel needs to stop here. We have this descent here, but the output schema meaning is rather different. #Resolved

@sfilipi sfilipi force-pushed the getDependanciesRemovePredicates branch from 95a5518 to 383c26f Compare February 25, 2019 21:49
Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sfilipi !

@sfilipi sfilipi merged commit f609f5a into dotnet:master Feb 26, 2019
@sfilipi sfilipi deleted the getDependanciesRemovePredicates branch February 26, 2019 19:02
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants