Skip to content

Remove IDataLoader from public API surfaces for specifying data #2231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomFinley opened this issue Jan 24, 2019 · 1 comment · Fixed by #2245
Closed

Remove IDataLoader from public API surfaces for specifying data #2231

TomFinley opened this issue Jan 24, 2019 · 1 comment · Fixed by #2245
Assignees
Labels
API Issues pertaining the friendly API
Milestone

Comments

@TomFinley
Copy link
Contributor

So in "command line" world, we have things that look like this:

[Argument(ArgumentType.AtMostOnce, IsInputFileName = true, HelpText = "Data file containing the terms", ShortName = "data", SortOrder = 110, Visibility = ArgumentAttribute.VisibilityType.CmdLineOnly)]
public string DataFile;
[Argument(ArgumentType.Multiple, HelpText = "Data loader", NullName = "<Auto>", SortOrder = 111, Visibility = ArgumentAttribute.VisibilityType.CmdLineOnly, SignatureType = typeof(SignatureDataLoader))]
public IComponentFactory<IMultiStreamSource, IDataLoader> Loader;
[Argument(ArgumentType.AtMostOnce, HelpText = "Name of the text column containing the terms", ShortName = "termCol", SortOrder = 112, Visibility = ArgumentAttribute.VisibilityType.CmdLineOnly)]
public string TermsColumn;

This makes sense, considering that when invoking a command line, you are not working in the context of an existing process but starting a new one, so the most plausible source for data is some file, which we have to specify how to load and so on and so on.

However, then we enter API land, and (understandably, to be clear) people just decided to do a direct translation, as we see below:

string file = null, string termsColumn = null,
IComponentFactory<IMultiStreamSource, IDataLoader> loaderFactory = null)

That the API might resemble command line as a first preference is understandable, but in this specific context of an API, a variance from this trend would make sense. We've invented what amounts to an entirely new API to load data from a source, when we already have mechanisms to do this.

If we wanted this to work over input IDataViews, which seems to be what the authors are really getting at, then it should just do so directly. This has a few advantages:

  1. No new way of loading files distinct from existing API for that same task,
  2. Simpler method signatures,
  3. Hides IDataLoader transform, which is something relating to Internalize concepts of IDataTransform/Loader/TransformTemplate. #1995 we need to do anyway. (This in particular is why we might consider this to have some greater urgency.)

/cc @Ivanidzo4ka @sfilipi

@TomFinley TomFinley added the API Issues pertaining the friendly API label Jan 24, 2019
@TomFinley TomFinley self-assigned this Jan 24, 2019
@TomFinley
Copy link
Contributor Author

TomFinley commented Jan 24, 2019

The problematic components affected by this issue seem to be the following, based on cursory examination:

  • CustomStopWordsRemovingEstimator (also the associated transformer that should be transformer, but is currently transform).
  • ValueToKeyMappingEstimator,

There may be more parts affecting the public API using IDataLoader directly, these were just the obvious ones.

Edit: From what I see actually the public API area for CustomStopWordsRemovingEstimator and the stop words estimator did not make the mistake of putting loader in the estimator/transformer based API, contrary to my cursory examination. Which is good, but I'll still rename the transform to transformer.

@shauheen shauheen added this to the 0119 milestone Feb 6, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants