You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Argument(ArgumentType.AtMostOnce,IsInputFileName=true,HelpText="Data file containing the terms",ShortName="data",SortOrder=110,Visibility=ArgumentAttribute.VisibilityType.CmdLineOnly)]
[Argument(ArgumentType.AtMostOnce,HelpText="Name of the text column containing the terms",ShortName="termCol",SortOrder=112,Visibility=ArgumentAttribute.VisibilityType.CmdLineOnly)]
publicstringTermsColumn;
This makes sense, considering that when invoking a command line, you are not working in the context of an existing process but starting a new one, so the most plausible source for data is some file, which we have to specify how to load and so on and so on.
However, then we enter API land, and (understandably, to be clear) people just decided to do a direct translation, as we see below:
That the API might resemble command line as a first preference is understandable, but in this specific context of an API, a variance from this trend would make sense. We've invented what amounts to an entirely new API to load data from a source, when we already have mechanisms to do this.
If we wanted this to work over input IDataViews, which seems to be what the authors are really getting at, then it should just do so directly. This has a few advantages:
No new way of loading files distinct from existing API for that same task,
The problematic components affected by this issue seem to be the following, based on cursory examination:
CustomStopWordsRemovingEstimator (also the associated transformer that should be transformer, but is currently transform).
ValueToKeyMappingEstimator,
There may be more parts affecting the public API using IDataLoader directly, these were just the obvious ones.
Edit: From what I see actually the public API area for CustomStopWordsRemovingEstimator and the stop words estimator did not make the mistake of putting loader in the estimator/transformer based API, contrary to my cursory examination. Which is good, but I'll still rename the transform to transformer.
So in "command line" world, we have things that look like this:
machinelearning/src/Microsoft.ML.Data/Transforms/ValueToKeyMappingTransformer.cs
Lines 116 to 123 in eed91b9
This makes sense, considering that when invoking a command line, you are not working in the context of an existing process but starting a new one, so the most plausible source for data is some file, which we have to specify how to load and so on and so on.
However, then we enter API land, and (understandably, to be clear) people just decided to do a direct translation, as we see below:
machinelearning/src/Microsoft.ML.Data/Transforms/ValueToKeyMappingEstimator.cs
Lines 41 to 42 in eed91b9
That the API might resemble command line as a first preference is understandable, but in this specific context of an API, a variance from this trend would make sense. We've invented what amounts to an entirely new API to load data from a source, when we already have mechanisms to do this.
If we wanted this to work over input
IDataView
s, which seems to be what the authors are really getting at, then it should just do so directly. This has a few advantages:IDataLoader
transform, which is something relating to Internalize concepts of IDataTransform/Loader/TransformTemplate. #1995 we need to do anyway. (This in particular is why we might consider this to have some greater urgency.)/cc @Ivanidzo4ka @sfilipi
The text was updated successfully, but these errors were encountered: