Skip to content

What namespaces to keep, what to drop, and what to hide #2326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sfilipi opened this issue Jan 30, 2019 · 10 comments
Closed

What namespaces to keep, what to drop, and what to hide #2326

sfilipi opened this issue Jan 30, 2019 · 10 comments
Assignees
Labels
API Issues pertaining the friendly API documentation Related to documentation of ML.NET

Comments

@sfilipi
Copy link
Member

sfilipi commented Jan 30, 2019

In the table below there are all the namespaces that are in ML.Net.
They also all display in the docs site. Some of them, like Microsoft.ML.Ensemble needs to continue to exist in the code, but it is not ready to be exposed to the users, and can be hidden in the docs site.

Some others, like Microsoft.ML.Internal.Internallearn can potentially be merged into other namespaces.
Let's annotate in the list below the namespaces that need to be hidden in the docs site, and the ones that need to be gone altogether:

Namespace Drop it from code Hide it from docs Ships In Nuget
Microsoft.ML Microsoft.ML
Microsoft.ML.Calibrator Microsoft.ML
Microsoft.ML.Core.Data drop
Microsoft.ML.Data Microsoft.ML
Microsoft.ML.Data.IO drop?
Microsoft.ML.Data.IO.Zlib drop
Microsoft.ML.Ensemble Microsoft.ML.Trainers.Ensemble hide None
Microsoft.ML.Ensemble.EntryPoints Microsoft.ML
Microsoft.ML.Ensemble.OutputCombiners Microsoft.ML.Trainers.Ensemble hide None
Microsoft.ML.Ensemble.Selector Microsoft.ML.Trainers.Ensemble hide None
Microsoft.ML.Ensemble.Selector.DiversityMeasure Microsoft.ML.Trainers.Ensemble hide None
Microsoft.ML.EntryPoints Microsoft.ML
Microsoft.ML.EntryPoints.JsonUtils drop?
Microsoft.ML.FactorizationMachine Microsoft.ML.Trainers.FactorizationMachine Microsoft.ML
Microsoft.ML.ImageAnalytics Microsoft.ML.ImageAnalytics
Microsoft.ML.ImageAnalytics.EntryPoints Microsoft.ML.ImageAnalytics
Microsoft.ML.Internal.Calibration Microsoft.Ml.Calibrator hide Microsoft.ML
Microsoft.ML.Internal.CpuMath hide Microsoft.ML
Microsoft.ML.Internal.Internallearn drop?
Microsoft.ML.Internal.Internallearn.ResultProcessor drop? None
Microsoft.ML.Learners Microsoft.ML.Trainers Microsoft.ML
Microsoft.ML.LightGBM Microsoft.ML.Trainers.LightGBM Microsoft.ML.LightGBM
Microsoft.ML.Model Microsoft.ML.Onnx , Microsoft.ML
Microsoft.ML.Model.Onnx Microsoft.ML.Onnx
Microsoft.ML.Numeric drop?
Microsoft.ML.SamplesUtils Microsoft.ML
Microsoft.ML.StaticPipe Microsoft.ML.StaticPipe
Microsoft.ML.StaticPipe.Runtime drop?
Microsoft.ML.Sweeper hide Microsoft.ML
Microsoft.ML.Sweeper.Algorithms hide Microsoft.ML
Microsoft.ML.TimeSeries Microsoft.ML.TimeSeries
Microsoft.ML.TimeSeriesProcessing drop?
Microsoft.ML.Tools hide?
Microsoft.ML.Trainers Microsoft.ML
Microsoft.ML.Trainers.FastTree Microsoft.ML
Microsoft.ML.Trainers.FastTree.Internal Microsoft.ML.Trainers.FastTree
Microsoft.ML.Trainers.HalLearners Microsoft.ML.Trainers.HalLearners
Microsoft.ML.Trainers.KMeans Microsoft.ML
Microsoft.ML.Trainers.Online Microsoft.ML
Microsoft.ML.Trainers.PCA Microsoft.ML
Microsoft.ML.Trainers.Recommender
Microsoft.ML.Trainers.SymSgd Microsoft.ML.Trainers.HalLearners
Microsoft.ML.Training drop
Microsoft.ML.Transforms
Microsoft.ML.Transforms.Categorical
Microsoft.ML.Transforms.Conversions
Microsoft.ML.Transforms.FeatureSelection
Microsoft.ML.Transforms.Normalizers
Microsoft.ML.Transforms.Projections
Microsoft.ML.Transforms.TensorFlow
Microsoft.ML.Transforms.Text
Microsoft.ML.UniversalModelFormat.Onnx Microsoft.ML.Model.Onnx
@TomFinley
Copy link
Contributor

TomFinley commented Jan 31, 2019

My surface reactions.

Regarding the namespaces that are reserved mostly for what we think of as classical "training" aalgorithms, we tended to put them in Microsoft.ML.Trainers, from what I see. (E.g.: Microsoft.ML.Trainers.FastTree).

Microsoft.ML.UniversalModelFormat.Onnx should become Microsoft.ML.Model.Onnx I think.

Since Microsoft.ML.StaticPipe stuff now resides in a separate nuget, it might be appropriate to move it to the Microsoft.ML namespace. However this is of lower priority since that associated nuget will not be stable for v1 I expect. (Might still be nice, since that should be easy to do I think.)

I am not sure that I see the point in the seemingly elaborate hierarchy of Transforms namespaces (e.g., categorical, conversions, feature-selection, normalizers), but I do not feel strongly about it. The negative thing I see about this is that the inevitable result is that the "odd-ball" transformers that are in fact less commonly used wind up being uncategorizable and so put in the regular Transforms namespace. But I do not feel strongly about this.

My expectation is that anything with EntryPoints in the name except for the thing for running/executing entry-points would stay. E.g., Microsoft.ML.EntryPoints would be public, but anything else would be internal. So that renders what I am about to say less urgent, but I would still do it: I do not, for example, see the point in there being a Microsoft.ML.Ensemble.EntryPoints namespace. (We do not elsewhere have the static classes where entry-point definitions reside, reside in a separate namespace, any more than we have the command-line invocation logic live in a separate namespace, etc. etc.)

Microsoft.ML.Trainers.FastTree.Internal should just become the regular namespace. If something really is internal, the classes and types should just be internal. Nothing to do with the namespace.

I do not understand why we have "Microsoft.ML.TimeSeries" and "Microsoft.ML.TimeSeriesProcessing". Anyway, it seems like these belong in one of either the "Transforms" or "Trainers" style namespace, probably Transforms. (So Microsoft.ML.Transforms.TimeSeries.)

Microsoft.ML.SamplesUtils contains a single class. I feel like these things, if we want to keep them, should be extensions on DataOperationsCatalog. I also don't understand what return value being a string means. This whole API is just problematic, not just namespace choice.

To be consistent with trainers, all the stuff in Microsoft.ML.Ensemble should be moved to Microsoft.ML.Trainers.Ensemble. This includes all those strange subnamespaces that seem to exist purely for the sake of communicating, "hey, look at me, the types in me have a different type hierarchy than other types elsewhere, isn't that super cool?" Which is a fairly silly use for namespaces. 😛

Microsoft.ML.Tools, from what I see this should not be (and as far as I can tell doesn't) contain public classes that ship as part of any nuget.

Microsoft.ML.Numeric seems to contain purely stuff used for LBFGS training. I guess i don't see the harm in having it have its own namespace maybe, but it should all be rendered internal, since the guts of our optimizer is hardly part of our public surface.

Microsoft.ML.Learners, should be moved to some appropriate choice of "trainers." Probably the same with so-called "InternalLearn".

Microsoft.ML.Internal.Internallearn.ResultProcessor. Goofy. Nothing in result processor lives in any public surface shipping as part of a nuget, so might as well give it the namespace Microsoft.ML.ResultProcessor.

Microsoft.ML.Calibrator, whatever happened to the idea of making calibration another data transform step?

@TomFinley
Copy link
Contributor

Note that while namespace changes are fine, I feel like the first order of business really must be to choose what we put in Microsoft.ML root namespace itself. There are lots of things that are definitely in this category and already in there (MLContext and extension methods), but lots of things also that are not yet there that probably should be (e.g., ITrainerEstimator, IEstimator, ITransformer, and friends).

@sfilipi
Copy link
Member Author

sfilipi commented Feb 5, 2019

We have both Microsoft.ML.Trainers and using Microsoft.ML.Learners namespaces. I think in the blogs/announcements we refer to them mostly as Learners. Drop the Microsoft.ML.Trainers in favor of Microsoft.ML.Learners?

@sfilipi
Copy link
Member Author

sfilipi commented Feb 6, 2019

      We have both Microsoft.ML.Trainers and using Microsoft.ML.Learners namespaces. I think in the blogs/announcements we refer to them mostly as Learners.  Drop the Microsoft.ML.Trainers in favor of Microsoft.ML.Learners?

Converging on Microsoft.ML.Trainers instead, since that seems to be the namespace the majority of trainers live in.

cc @CESARDELATORRE @JRAlexander @luisquintanilla

@sfilipi sfilipi added API Issues pertaining the friendly API documentation Related to documentation of ML.NET labels Feb 6, 2019
@JRAlexander
Copy link

I'd prefer Microsoft.ML.Algorithms.

@JRAlexander
Copy link

It would match https://dotnet.microsoft.com/learn/machinelearning-ai/what-is-mldotnet. "Choose Algorithm"

@TomFinley
Copy link
Contributor

TomFinley commented Feb 7, 2019

I'd prefer Microsoft.ML.Algorithms.

I would not. We have trainers, transforms, predictors... if some random note written by someone happened to use such a generic term as "algorithm," and it is apparently having the ability to affect how people think about such things, that is what must be changed.

@TomFinley
Copy link
Contributor

Regarding Microsoft.ML.Data.IO, I think everything here should be hidden, it can probably be moved into Microsoft.ML.Data if we care. I personally might keep it where it is though, since it's very, very specific to binary loaders/savers.

@JRAlexander
Copy link

JRAlexander commented Feb 7, 2019

Ok, then what about Microsoft.ML.LearningAlgorithms?
Again it would match the terms we are already using on the marketing site... and are industry standard...

Choose Algorithm
Choose the learning algorithm that will provide the highest accuracy for your scenario. ML.NET offers the following types of learners:

Linear (e.g. SymSGD, SDCA)
Boosted Trees (e.g. FastTree, LightGBM)
K-Means
SVM
Averaged Perceptron

sfilipi added a commit that referenced this issue Feb 10, 2019
* removing the Microsoft.ML.Learners and the Microsoft.ML.Trainers.SymSGD namespaces.

* removing the Microsoft.ML.Internal.Internallearn.ResultProcessor and the Microsoft.ML.Trainers.FastTree.Internal namespaces.

* regenerating the catalog
@eerhardt
Copy link
Member

eerhardt commented Feb 25, 2019

FYI -

  1. We still have public types in ML.Trainers and ML.Training namespaces. These should be consolidated. -- Drop Microsoft.ML.Training and replace it with Microsoft.ML.Trainers #2713 has been logged to track this.
  2. We still have Microsoft.ML.Internal.Calibration and Microsoft.ML.Internal.Internallearn, which should be hidden/moved/renamed. -- Microsoft.ML.Internal.Internallearn should be hidden/moved/renamed. #2714 has been logged to track this.
  3. we also have 1 type in:
namespace Microsoft.ML.Data.Evaluators.Metrics {
    public sealed class AnomalyDetectionMetrics {
        public double Auc { get; }
        public double DrAtK { get; }
    }
}
  1. we also have 2 public types in:
namespace Microsoft.ML.Model {
    public interface ICanSaveModel {
        void Save(ModelSaveContext ctx);
    }
    public sealed class ModelSaveContext : IDisposable {
        public void Dispose();
    }
}

This isn't enough types (IMO) for it's own namespace.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API documentation Related to documentation of ML.NET
Projects
None yet
Development

No branches or pull requests

5 participants