Skip to content

More internalization of non-public Microsoft.ML.Data #2453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 7, 2019

Conversation

TomFinley
Copy link
Contributor

Yet another step along #1602, where much miscellaneous internal infrastructure in Microsoft.ML.Data gets hidden, or even in some cases removed. I think this round removed about 600 or so public members or thereabouts. (Though TBH about 500 of that was just due to IEvaluator implementations alone).

Some of the commits are very small (e.g., internalize vector utils is more or less what it sounds like), whereas others are more involved.

Crucially, prediction transformers are no longer generic on the scorers, and all scorers are internal. This is related to #1995, since scorers were IDataTransformer implementors, so this is another place where we're removing things related to this deprecated interface from our public surface.

I also took the opportunity to do some opportunistic changing of the tests to use more of the public surface area, as opposed to the non-public APIs, especially around text writing. Incidentally filed #2452 as a side effect of that work.

5 4 4 5 7 10 3 2 0 0.566368 -0.112074189 0.169326738 -0.5519618 0.517083168 -0.2568235
3 1 1 1 2 2 3 1 0 0.497458071 -0.293033749 0.419352561 -0.39683342 0.572824 -0.07215268
6 8 8 1 3 4 3 7 0 -0.382303447 -0.432640046 0.289225727 -0.4996817 0.571184337 -0.08415316
5 1 1 1 2 1 3 1 0 -0.157029659 -0.555585265 0.490177631 -0.305056125 0.35670203 -0.453979075
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that because RFF as its name suggests uses PRNG extensively, and the random initialization behavior differs between ConsoleEnvironment and MLContext, that this is why this baseline changed, in case anyone was curious.

@@ -62,13 +63,13 @@ ColumnType GetType(Schema schema, string name)
t = GetType(data.Schema, "f4");
Assert.True(t is VectorType vt4 && vt4.ItemType == NumberType.R4 && vt4.Size == 0);

data = ColumnSelectingTransformer.CreateKeep(Env, data, new[] { "f1", "f2", "f3", "f4" });
data = ML.Transforms.SelectColumns("f1", "f2", "f3", "f4").Fit(data).Transform(data);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm becoming more sympathetic to those that consider #2354 a priority @artidoro and @glebuk. 😆

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

{

Do we want users to manipulate evaluators?
I might be wrong, but I thought that we only want users to handle evaluation metrics produced by the evaluators. As such, can we make this class as well as the base class internal? There are quite a few classes that derive from PerInstanceEvaluatorBase. #Resolved


Refers to: src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs:871 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

{

If we want to keep this public, can we make the fields in here internal, like LoaderSignature and such.


In reply to: 461298043 [](ancestors = 461298043)


Refers to: src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs:871 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

public sealed class ClusteringPerInstanceEvaluator : PerInstanceEvaluatorBase

I would think that the same comment about internalization of the class or of the fields applies here. #Resolved


Refers to: src/Microsoft.ML.Data/Evaluators/ClusteringEvaluator.cs:565 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

public sealed class MultiClassPerInstanceEvaluator : PerInstanceEvaluatorBase

And here as well, same comment about internalization of the class or of the fields. #Resolved


Refers to: src/Microsoft.ML.Data/Evaluators/MultiClassClassifierEvaluator.cs:538 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False)

@@ -25,7 +23,7 @@ public WordEmbeddingsTests(ITestOutputHelper helper)
public void TestWordEmbeddings()
{
var dataPath = GetDataPath(TestDatasets.Sentiment.trainFilename);
var data = new TextLoader(Env,
var data = new TextLoader(ML,
Copy link
Member

@sfilipi sfilipi Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new TextLoader [](start = 23, length = 14)

you don't seem to like ML.Data.CreateTextLoader :) #Resolved

Copy link
Contributor Author

@TomFinley TomFinley Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not sure. I haven't tried to use it much yet. So I might like it. I'm not trying to make the tests so that they all call the public surface yet (don't have time to do that all right now), just the things relevant to the stuff that I was deleting. Then of course I saw this DataSaverUtils.SaveDataView and that just ... well, that had to go. :) Of course I couldn't quite make it all go due to a flaw our API being incomplete (see #2452), but I could at least make a dent in it. Plus I get to start using MLContext more than ConsoleEnvironment, which strikes me as good.


In reply to: 254556100 [](ancestors = 254556100)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

public sealed class MultiOutputRegressionPerInstanceEvaluator : PerInstanceEvaluatorBase

Here as well, if not the class, we could internalize some of the fields/constructors. #Resolved


Refers to: src/Microsoft.ML.Data/Evaluators/MultiOutputRegressionEvaluator.cs:374 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

public sealed class QuantileRegressionPerInstanceEvaluator : PerInstanceEvaluatorBase

And here as well, is it possible to internalize the class? Otherwise the fields should be cleaned up a bit. #Resolved


Refers to: src/Microsoft.ML.Data/Evaluators/QuantileRegressionEvaluator.cs:261 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

{

Interesting, here it is internal, but derives from IDataTransform? #Resolved


Refers to: src/Microsoft.ML.Data/Evaluators/RankerEvaluator.cs:543 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

{

Same question here, can we internalize the class/baseclass or the fields? #Resolved


Refers to: src/Microsoft.ML.Data/Evaluators/RegressionEvaluator.cs:194 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

    {

Not your change, but it was a bit bothering, could you remove the extra line? #Resolved


Refers to: src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs:112 in 26089f2. [](commit_id = 26089f2, deletion_comment = False)

@@ -164,10 +161,8 @@ public abstract class SingleFeaturePredictionTransformerBase<TModel, TScorer> :
/// </summary>
public ColumnType FeatureColumnType { get; }

Copy link
Contributor

@artidoro artidoro Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not making this internal? #Pending

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine actually, disregard that sorry.


In reply to: 254558228 [](ancestors = 254558228)

Copy link
Contributor Author

@TomFinley TomFinley Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the class? I think this has to remain public, but I could be wrong. This is publishing additional information about what feature columns it is consuming, and it seems like people find that sort of information valueable on transformers (see, e.g., #2454, where the input column name is considered valuable information). Of course, there's a question of whether we actually want our transformers to publish that for v1. That I'm less sure about. So maybe we do want to internalize this. I don't know. But I wasn't planning on rocking the boat on that yet, just internalizing the stuff I know must be hidden.


In reply to: 254560344 [](ancestors = 254560344,254558228)

@@ -55,7 +53,8 @@ public abstract class PredictionTransformerBase<TModel, TScorer> : IPredictionTr

public bool IsRowToRowMapper => true;

Copy link
Contributor

@artidoro artidoro Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a line of XML? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I could.


In reply to: 254558375 [](ancestors = 254558375)

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

    {

Same here, if possible a line of XML would be great (I usually get inspired from the interface) #Resolved


Refers to: src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs:201 in 26089f2. [](commit_id = 26089f2, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

        : base(Contracts.CheckRef(env, nameof(env)).Register(nameof(BinaryPredictionTransformer<TModel>)), ctx)

Can we make this internal too? I think these should never be instantiated directly from the constructors right? #Resolved


Refers to: src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs:327 in 26089f2. [](commit_id = 26089f2, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

        : base(Contracts.CheckRef(env, nameof(env)).Register(nameof(MulticlassPredictionTransformer<TModel>)), ctx)

Same here, could we internalize the constructor? #Resolved


Refers to: src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs:393 in 26089f2. [](commit_id = 26089f2, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

{

Is it possible to internalize this class? I think most Arguments classes that derive from it are themselves in an internal class. #Resolved


Refers to: src/Microsoft.ML.Data/Scorers/RowToRowScorerBase.cs:305 in 26089f2. [](commit_id = 26089f2, deletion_comment = False)

@TomFinley
Copy link
Contributor Author

{

Well, PerInstanceEvaluatorBase is not an evaluator, so it wasn't quite my focus, but I suppose there's little harm in hiding it now. I was going to make a push to internalize the IRowMapper implementations later, but you're right, no harm in at least getting started...


In reply to: 461298134 [](ancestors = 461298134,461298043)


Refers to: src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs:871 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False)

@TomFinley
Copy link
Contributor Author

{

Ah right, thanks.


In reply to: 461302782 [](ancestors = 461302782)


Refers to: src/Microsoft.ML.Data/Scorers/RowToRowScorerBase.cs:305 in 26089f2. [](commit_id = 26089f2, deletion_comment = False)

@TomFinley
Copy link
Contributor Author

        : base(Contracts.CheckRef(env, nameof(env)).Register(nameof(BinaryPredictionTransformer<TModel>)), ctx)

Right, yes we should ... I think I did in some other places but somehow wasn't consistent with it. While doing this I also realized FieldAwareFactorizationMachinePredictionTransformer was doing some bad things (exposing mutable arrays as properties) so opportunistically fixed that too.


In reply to: 461301953 [](ancestors = 461301953)


Refers to: src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs:327 in 26089f2. [](commit_id = 26089f2, deletion_comment = False)

@TomFinley
Copy link
Contributor Author

{

Yes!


In reply to: 461299450 [](ancestors = 461299450)


Refers to: src/Microsoft.ML.Data/Evaluators/RegressionEvaluator.cs:194 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False)

@TomFinley
Copy link
Contributor Author

TomFinley commented Feb 7, 2019

{

Rankers are a little different in that they work collectively over multiple examples per query. So the convenience class could not be used in this case.


In reply to: 461299201 [](ancestors = 461299201)


Refers to: src/Microsoft.ML.Data/Evaluators/RankerEvaluator.cs:543 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False)

@TomFinley
Copy link
Contributor Author

    {

IT WILL STAY HERE TILL THE END OF TIME.


In reply to: 461300615 [](ancestors = 461300615)


Refers to: src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs:112 in 26089f2. [](commit_id = 26089f2, deletion_comment = False)

@TomFinley
Copy link
Contributor Author

    {

That might work sometimes. I got uninspired by the current documentation on that interface.


In reply to: 461301600 [](ancestors = 461301600)


Refers to: src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs:201 in 26089f2. [](commit_id = 26089f2, deletion_comment = False)

@TomFinley
Copy link
Contributor Author

Hi @artidoro I think I resolved your comments, so if you could take a look to confirm that would be great, thanks!

@codecov
Copy link

codecov bot commented Feb 7, 2019

Codecov Report

Merging #2453 into master will decrease coverage by <.01%.
The diff coverage is 98.71%.

@@            Coverage Diff             @@
##           master    #2453      +/-   ##
==========================================
- Coverage   71.22%   71.22%   -0.01%     
==========================================
  Files         785      785              
  Lines      141030   140977      -53     
  Branches    16116    16116              
==========================================
- Hits       100455   100408      -47     
+ Misses      36106    36098       -8     
- Partials     4469     4471       +2
Flag Coverage Δ
#Debug 71.22% <98.71%> (-0.01%) ⬇️
#production 67.58% <88.88%> (ø) ⬆️
#test 85.28% <100%> (-0.03%) ⬇️

@artidoro
Copy link
Contributor

artidoro commented Feb 7, 2019

Thank you for taking care of them Tom!

@TomFinley TomFinley merged commit 453f191 into dotnet:master Feb 7, 2019
@TomFinley TomFinley deleted the Hide2 branch February 11, 2019 20:21
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants