Skip to content

Explicit implementation for IsRowToRowMapper and GetRowToRowMapper #2673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 25, 2019

Conversation

artidoro
Copy link
Contributor

Fixes #2540.

As explained in the issue I make the implementations of two methods of theITransformer interface IsRowToRowMapper and GetRowToRowMapper explicit.

In a previous PR #2431 I made the implementation of ICanSaveModel.Save explicit. Save is part of ITransformer as it derives from ICanSaveModel. I found a few instances that have been changed since my last PR in the TimeSeries assembly and that still had a regular implementation for `Save. I made their implementation explicit in this PR as it is very similar in spirit.

@artidoro artidoro added the API Issues pertaining the friendly API label Feb 21, 2019
@artidoro artidoro self-assigned this Feb 21, 2019
@codecov
Copy link

codecov bot commented Feb 21, 2019

Codecov Report

Merging #2673 into master will decrease coverage by <.01%.
The diff coverage is 82.92%.

@@            Coverage Diff             @@
##           master    #2673      +/-   ##
==========================================
- Coverage   71.57%   71.57%   -0.01%     
==========================================
  Files         800      800              
  Lines      141789   141792       +3     
  Branches    16107    16107              
==========================================
+ Hits       101488   101489       +1     
- Misses      35851    35854       +3     
+ Partials     4450     4449       -1
Flag Coverage Δ
#Debug 71.57% <82.92%> (-0.01%) ⬇️
#production 67.87% <82.35%> (ø) ⬆️
#test 85.73% <85.71%> (-0.02%) ⬇️
Impacted Files Coverage Δ
...eSeries/SequentialAnomalyDetectionTransformBase.cs 84.21% <0%> (ø) ⬆️
test/Microsoft.ML.Benchmarks/HashBench.cs 0% <0%> (ø) ⬆️
...c/Microsoft.ML.Data/Prediction/PredictionEngine.cs 68.18% <0%> (ø) ⬆️
.../Microsoft.ML.TimeSeries/IidChangePointDetector.cs 69.82% <100%> (ø) ⬆️
...Microsoft.ML.Data/Scorers/PredictionTransformer.cs 97.02% <100%> (ø) ⬆️
.../Microsoft.ML.TimeSeries/SsaChangePointDetector.cs 77.51% <100%> (ø) ⬆️
...crosoft.ML.TimeSeries/SequentialTransformerBase.cs 53.01% <100%> (ø) ⬆️
src/Microsoft.ML.TimeSeries/SsaSpikeDetector.cs 82.9% <100%> (ø) ⬆️
...icrosoft.ML.Transforms/CustomMappingTransformer.cs 92% <100%> (ø) ⬆️
src/Microsoft.ML.TimeSeries/IidSpikeDetector.cs 72.54% <100%> (ø) ⬆️
... and 12 more

@@ -123,9 +121,9 @@ private static bool IsChainRowToRowMapper(IDataView view)
return true;
}

public bool IsRowToRowMapper { get; }
bool ITransformer.IsRowToRowMapper => IsChainRowToRowMapper(_xf);
Copy link
Contributor

@TomFinley TomFinley Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsChainRowToRowMapper [](start = 46, length = 21)

This operation is somewhat involved, so I would prefer if you could have a backing field to store this result as we had in the initial implementation. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good let me do that!


In reply to: 258775778 [](ancestors = 258775778)

@@ -162,7 +162,7 @@ internal SsaSpikeDetector(IHostEnvironment env, ModelLoadContext ctx)
InternalTransform.Host.CheckDecode(InternalTransform.IsAdaptive == false);
}

public override void Save(ModelSaveContext ctx)
private protected override void SaveModel(ModelSaveContext ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private protected override void SaveModel [](start = 8, length = 41)

Ahh, thank you very much for finding more of these!!

Copy link
Contributor

@TomFinley TomFinley Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually wait, these are already internal, so probably not necessary... still, eh... I don't know. If you think it helps with consistency and readability.


In reply to: 258775986 [](ancestors = 258775986)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the containing classes are public. The classes in which I made the implementation of ICanSaveModel,Save explicit derive from SsaAnomalyDetectionBaseWrapper or IidAnomalyDetectionBaseWrapper which are both public.
There is just one that does not, and that's the Mapper class in SequentialAnomalyDetectionTransformBase which is internal. This was the only instance where a mapper class does not have explicit implementation of the Save method. In the previous PR, we decided the that despite this being an internal class it was beneficial to do an explicit implementation for consistency as it implements ICanSaveModel. So all mappers elsewhere have an explicit implementation of that method.


In reply to: 258776058 [](ancestors = 258776058,258775986)

@TomFinley
Copy link
Contributor

Thank you @artidoro I finished a pass, the only real high level things I saw were not storing the result of that somewhat expensive call in the wrapper. More minor was the somewhat needless internalization of members of an already internal class, but I consider that less urgent (especially if you feel strongly that being consistent in that fashion is desirable, which I could understand).

@@ -348,7 +348,7 @@ public DataViewSchema.DetachedColumn[] GetOutputColumns()
return col => false;
}

public void Save(ModelSaveContext ctx) => _parent.SaveModel(ctx);
void ICanSaveModel.Save(ModelSaveContext ctx) => _parent.SaveModel(ctx);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was internal already, but in every other Mapper elsewhere we have made this one explicit despite it being internal for consistency.

{
Contracts.CheckValue(inputSchema, nameof(inputSchema));
Contracts.Check(IsRowToRowMapper, nameof(GetRowToRowMapper) + " method called despite " + nameof(IsRowToRowMapper) + " being false.");
Contracts.Check(((ITransformer)this).IsRowToRowMapper, nameof(ITransformer.GetRowToRowMapper) + " method called despite " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsRowToRowMapper [](start = 49, length = 16)

I feel we should hide IsRowToRowMapper as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are asking whether we can make IsRowToRowMapper internal.

I discussed this with Tom at length and the answer is that we should not make it internal. The reason is that when a ITransformer is a IRowToRowMapper, the performance of the prediction engine can be significantly increased compared to when it is not. Tom talked of a difference of orders of magnitude.

Since it is such an important characteristics of the ITransformer we think it is better to keep it as part of the public API. This is especially true when we think about possibly allowing expansion of the library with custom components in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its for potential use, I feel we should hide for this version because I don't think users will be able to control the performance of ML.NET in the near future.


In reply to: 259442579 [](ancestors = 259442579)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let @TomFinley explain his reasons.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general the policy is a good one, that we should try to limit the scope of the public API. However, I feel very strongly that something like IRowToRowMapper has to exist.

The reason why I think this is the case is that I think back to #973. Literally everything can be captured as mapping from IDataView to IDataView, as we have seen. But we see that if we have a more "specific" interface for this row-to-row mapping, we can gain some benefit -- in this case, orders of magnitude speedup and a far more clear interface from using this "bypass" mechanism, in the case where we can use it.

From this perspective, we might say, "very well, then IRowToRowMapper should just be internal and prediction engine will still be free to use it." Which is of course possible. But then I think to myself, if PredictionEngine is useful from a reflection-driven perspective (as it quite clearly is), what is the dynamic API's answer to this scenario? There is nothing except IRowToRowMapper.

So, I think the case for something like IRowToRowMapper is clear, so I think we should ship it. So does it have to be that specifically? I don't see any better proposal than that interface. (Though perhaps it could be refined, e.g., with @sfilipi making it operating over schema columns specifically rather than int => bool delegates, etc.) But I think the principle has proven itself useful and should be made public, just like IDataView has been around, proven itself useful, and should be made public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I know this already closed, but though I'd answer the direct question. 😄 )

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @artidoro !!

@artidoro artidoro merged commit 4acf5aa into dotnet:master Feb 25, 2019
@artidoro artidoro deleted the explicit branch March 13, 2019 17:55
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants