Skip to content

Rename IDataLoader, IDataReader and IDataReaderEstimator #2731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 27, 2019

Conversation

artidoro
Copy link
Contributor

Fixes #2144.

As discussed in the issue, it was agreed that TextReader, BinaryReader and IDataReader were bad names because they overlap with .NET concepts.

In this PR I:

  1. rename IDataLoader to ILegacyDataLoader (commit 1).
  2. rename IDataReader to IDataLoader and IDataReaderEstimator to IDataLoaderEstimator (commit 2).

@artidoro artidoro added the API Issues pertaining the friendly API label Feb 26, 2019
@artidoro artidoro self-assigned this Feb 26, 2019
Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codecov
Copy link

codecov bot commented Feb 26, 2019

Codecov Report

Merging #2731 into master will not change coverage.
The diff coverage is 87.79%.

@@           Coverage Diff           @@
##           master    #2731   +/-   ##
=======================================
  Coverage   71.64%   71.64%           
=======================================
  Files         807      807           
  Lines      142337   142337           
  Branches    16117    16117           
=======================================
  Hits       101983   101983           
+ Misses      35918    35917    -1     
- Partials     4436     4437    +1
Flag Coverage Δ
#Debug 71.64% <87.79%> (ø) ⬆️
#production 67.89% <77.68%> (ø) ⬆️
#test 85.83% <96.21%> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Data/Commands/TrainTestCommand.cs 91.73% <ø> (ø) ⬆️
src/Microsoft.ML.Data/Commands/TestCommand.cs 97.01% <ø> (ø) ⬆️
...c/Microsoft.ML.Transforms/Text/WordBagTransform.cs 79.08% <ø> (ø) ⬆️
...Microsoft.ML.StaticPipe/StaticPipeInternalUtils.cs 82.44% <ø> (ø) ⬆️
....ML.Data/DataLoadSave/Transpose/TransposeLoader.cs 49.22% <ø> (ø) ⬆️
...rosoft.ML.Data/DataLoadSave/Binary/BinaryLoader.cs 65.22% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/GamModelParameters.cs 46.51% <ø> (ø) ⬆️
src/Microsoft.ML.Core/Data/IEstimator.cs 84.53% <ø> (ø) ⬆️
src/Microsoft.ML.Parquet/ParquetLoader.cs 67.54% <ø> (ø) ⬆️
...soft.ML.Data/DataLoadSave/DataOperationsCatalog.cs 61.11% <ø> (ø) ⬆️
... and 130 more

where TLastTransformer : class, ITransformer
{
/// <summary>
/// The underlying data reader.
/// </summary>
public readonly IDataReader<TSource> Reader;
public readonly IDataLoader<TSource> Reader;
Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public readonly IDataLoader<TSource> Reader;
public readonly IDataLoader<TSource> Loader;

Please double-check other public fields and public functions' arguments. #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update the comment here.


In reply to: 260381373 [](ancestors = 260381373)

/// <summary>
/// The chain of transformers (possibly empty) that are applied to data upon reading.
/// </summary>
public readonly TransformerChain<TLastTransformer> Transformer;

public CompositeDataReader(IDataReader<TSource> reader, TransformerChain<TLastTransformer> transformerChain = null)
public CompositeDataReader(IDataLoader<TSource> reader, TransformerChain<TLastTransformer> transformerChain = null)
Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public CompositeDataReader(IDataLoader<TSource> reader, TransformerChain<TLastTransformer> transformerChain = null)
public CompositeDataReader(IDataLoader<TSource> loader, TransformerChain<TLastTransformer> transformerChain = null)
``` #Resolved

@@ -90,7 +90,7 @@ public static class CompositeDataReader
/// <summary>
/// Save the contents to a stream, as a "model file".
/// </summary>
public static void SaveTo<TSource>(this IDataReader<TSource> reader, IHostEnvironment env, Stream outputStream)
public static void SaveTo<TSource>(this IDataLoader<TSource> reader, IHostEnvironment env, Stream outputStream)
Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static void SaveTo<TSource>(this IDataLoader<TSource> reader, IHostEnvironment env, Stream outputStream)
public static void SaveTo<TSource>(this IDataLoader<TSource> loader, IHostEnvironment env, Stream outputStream)
``` #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!


In reply to: 260381588 [](ancestors = 260381588)

@@ -106,7 +106,7 @@ public static void SaveTo<TSource>(this IDataReader<TSource> reader, IHostEnviro
using (var ch = env.Start("Loading pipeline"))
{
ch.Trace("Loading data reader");
ModelLoadContext.LoadModel<IDataReader<IMultiStreamSource>, SignatureLoadModel>(env, out var reader, rep, "Reader");
ModelLoadContext.LoadModel<IDataLoader<IMultiStreamSource>, SignatureLoadModel>(env, out var reader, rep, "Reader");
Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ModelLoadContext.LoadModel<IDataLoader<IMultiStreamSource>, SignatureLoadModel>(env, out var reader, rep, "Reader");
ModelLoadContext.LoadModel<IDataLoader<IMultiStreamSource>, SignatureLoadModel>(env, out var loader, rep, "Reader");
``` #Resolved

@@ -106,7 +106,7 @@ public static void SaveTo<TSource>(this IDataReader<TSource> reader, IHostEnviro
using (var ch = env.Start("Loading pipeline"))
{
ch.Trace("Loading data reader");
Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ch.Trace("Loading data reader");
ch.Trace("Loading data loader");
``` #Resolved

@@ -8,13 +8,13 @@ namespace Microsoft.ML.Data
/// An estimator class for composite data reader.
Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// An estimator class for composite data reader.
/// An estimator class for composite data loader.

So many readers. Is it possible to do a global text replacement? #Resolved

@@ -14,7 +14,7 @@ public static class DataReaderExtensions
/// </summary>
/// <param name="reader">The reader to use.</param>
Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// <param name="reader">The reader to use.</param>
/// <param name="loader">The loader to use.</param>
``` #Resolved

public interface IDataReaderEstimator<in TSource, out TReader>
where TReader : IDataReader<TSource>
public interface IDataLoaderEstimator<in TSource, out TLoader>
where TLoader : IDataLoader<TSource>
{
// REVIEW: you could consider the transformer to take a different <typeparamref name="TSource"/>, but we don't have such components
// yet, so why complicate matters?
/// <summary>
/// Train and return a data reader.
Copy link
Member

@singlis singlis Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reader [](start = 36, length = 6)

loader
#Resolved

@singlis
Copy link
Member

singlis commented Feb 26, 2019

    /// The output schema of the reader.

loader
#Resolved


Refers to: src/Microsoft.ML.Core/Data/IEstimator.cs:237 in d55999f. [](commit_id = d55999f, deletion_comment = False)

@@ -225,7 +225,7 @@ internal bool TryFindColumn(string name, out Column column)
/// The 'data reader' takes a certain kind of input and turns it into an <see cref="IDataView"/>.
/// </summary>
/// <typeparam name="TSource">The type of input the reader takes.</typeparam>
Copy link
Member

@singlis singlis Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reader [](start = 56, length = 6)

loader - I would do a pass over the comments replacing reader with loader. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!


In reply to: 260394231 [](ancestors = 260394231)

@@ -58,7 +58,7 @@ public static DataDebuggerPreview Preview(this ITransformer transformer, IDataVi
/// <param name="reader">The data reader to preview</param>
/// <param name="source">The source to pull the data from</param>
/// <param name="maxRows">Maximum number of rows to pull</param>
Copy link
Member

@singlis singlis Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change reader to loader? If changed, update the paramref in the comments. #Resolved

@@ -18,7 +18,7 @@ public static class LearningPipelineExtensions
/// Create a new composite reader estimator, by appending another estimator to the end of this data reader estimator.
/// </summary>
public static CompositeReaderEstimator<TSource, TTrans> Append<TSource, TTrans>(
Copy link
Member

@singlis singlis Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompositeReaderEstimator [](start = 22, length = 24)

So we now have a DataLoaderEstimator but now have a CompositeReaderEstimator. Should this also become a CompositeLoaderEstimator? And TrivialReaderEstimator->TrivialLoaderEstimator? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point!


In reply to: 260396675 [](ancestors = 260396675)

@@ -66,9 +66,9 @@ public static IDataView LoadPipeline(IHostEnvironment env, RepositoryReader rep,
env.CheckValue(files, nameof(files));
using (var ent = rep.OpenEntry(DirDataLoaderModel, ModelLoadContext.ModelStreamName))
{
IDataLoader loader;
ILegacyDataLoader loader;
Copy link
Member

@singlis singlis Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ILegacyDataLoader [](start = 16, length = 17)

Just curious - will all code that uses the ILegacyDataLoader be also moved to legacy? Or does it need to be updated to use the DataLoader? #Resolved

@@ -33,7 +33,7 @@ namespace Microsoft.ML.Data
/// Loads a parquet file into an IDataView. Supports basic mapping from Parquet input column data types to framework data types.
/// </summary>
[BestFriend]
internal sealed class ParquetLoader : IDataLoader, IDisposable
internal sealed class ParquetLoader : ILegacyDataLoader, IDisposable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ILegacyDataLoader [](start = 42, length = 17)

ah! I cant see ParquetLoader going to legacy, so this will need to be updated to DataLoader right? Are there any issues tracking this work? I would think this could happen post 1.0 but would still be good to track.

Copy link
Contributor Author

@artidoro artidoro Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will open an issue regarding this. You are right, I wonder if @yaeldekel 's work related to #2735 will touch this.


In reply to: 260398136 [](ancestors = 260398136)

@singlis
Copy link
Member

singlis commented Feb 26, 2019

@artidoro I approve - but left comments on naming #Resolved

Copy link
Member

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @artidoro, and also thank you for making sure your commits were structured logically in the natural order of renamings. It made it much easier to review. I feel like we will probably need to do more work in the future to make sure documentation, parameters, and whatnot, are appropriately structured.

Label: c.LoadBool(0),
Features: c.LoadFloat(1, 9)
),
separator: '\t', hasHeader: true);

// Then, we use the reader to read the data as an IDataView.
var data = reader.Read(dataFilePath);
// Then, we use the reader to load the data as an IDataView.
Copy link
Contributor Author

@artidoro artidoro Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reader [](start = 32, length = 6)

loader #Resolved

}

private static IDataLoader CreateCore(IHost host, IDataLoader srcLoader,
KeyValuePair<string, IComponentFactory<IDataView, IDataTransform>>[] transformArgs)
public IDataView Load(TSource input)
Copy link
Contributor Author

@artidoro artidoro Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load [](start = 25, length = 4)

Add a line of comment similar to the IDataLoader.Load that explains this is lazy. #Resolved

@@ -7,14 +7,14 @@

namespace Microsoft.ML
{
public static class DataReaderExtensions
public static class DataLoaderExtensions
{
/// <summary>
/// Reads data from one or more file <paramref name="path"/> into an <see cref="IDataView"/>.
Copy link
Contributor Author

@artidoro artidoro Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads [](start = 12, length = 5)

Load. #Resolved

{
/// <summary>
/// Reads data from one or more file <paramref name="path"/> into an <see cref="IDataView"/>.
/// </summary>
/// <param name="reader">The reader to use.</param>
/// <param name="loader">The loader to use.</param>
Copy link
Contributor Author

@artidoro artidoro Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one line saying that this is lazy. #Resolved

@@ -12,33 +12,33 @@ namespace Microsoft.ML
public static class BinaryLoaderSaverCatalog
{
/// <summary>
/// Read a data view from an <see cref="IMultiStreamSource"/> on a binary file using <see cref="BinaryLoader"/>.
/// Load a data view from an <see cref="IMultiStreamSource"/> on a binary file.
/// </summary>
Copy link
Contributor Author

@artidoro artidoro Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a line explaining that this is lazy. #Resolved

}

/// <summary>
/// Read a data view from a binary file using <see cref="BinaryLoader"/>.
/// Load a data view from a binary file.
/// </summary>
Copy link
Contributor Author

@artidoro artidoro Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a line explaining that this is lazy. #Resolved

allowSparse, trimWhitespace, dataSample: dataSample);

/// <summary>
/// Read a data view from a text file using <see cref="TextLoader"/>.
/// Load a data view from a text file using <see cref="TextLoader"/>.
/// </summary>
Copy link
Contributor Author

@artidoro artidoro Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a line explaining that this is lazy. #Resolved

}

/// <summary>
/// Read a data view from a text file using <see cref="TextLoader"/>.
/// Load a data view from a text file using <see cref="TextLoader"/>.
/// </summary>
Copy link
Contributor Author

@artidoro artidoro Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a line explaining that this is lazy. #Resolved

@artidoro
Copy link
Contributor Author

artidoro commented Feb 27, 2019

// The .NET Foundation licenses this file to you under the MIT license.

Update the cookbooks. #Resolved


Refers to: test/Microsoft.ML.Tests/Scenarios/Api/CookbookSamples/CookbookSamples.cs:2 in 59b4492. [](commit_id = 59b4492, deletion_comment = False)

}

/// <summary>
/// Read a data view from a text file using <see cref="TextLoader"/>.
/// Load a data view from a text file using <see cref="TextLoader"/>.
/// </summary>
Copy link
Contributor Author

@artidoro artidoro Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a line explaining that this is lazy. #Resolved

"Composite Data Loader", "CompositeDataLoader", "Composite", "PipeData", "Pipe", "PipeDataLoader")]

[assembly: LoadableClass(typeof(IDataLoader), typeof(CompositeDataLoader), null, typeof(SignatureLoadDataLoader),
"Pipe DataL Loader", CompositeDataLoader.LoaderSignature)]
Copy link
Member

@singlis singlis Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the entrypoints? Shouldn't these remain?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see -- this is moved to LegacyCompositeDataLoader.


In reply to: 260564833 [](ancestors = 260564833)

Copy link
Member

@singlis singlis Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it better to git rename CompositeDataLoader -> LegacyCompositeDataLoader and then add a new file of CompositeDataLoader? The diff looks like you replaced the code in this file and added a LegacyCompositeDataLoader. The rename would help maintain history with the file.


In reply to: 260565188 [](ancestors = 260565188,260564833)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do thank you for pointing it out and taking another look at this.


In reply to: 260565382 [](ancestors = 260565382,260565188,260564833)

@artidoro
Copy link
Contributor Author

artidoro commented Feb 27, 2019

Also need to rename ReadFromEnumerable to LoadFromEnumerable. #Resolved

@artidoro artidoro merged commit b1801a8 into dotnet:master Feb 27, 2019
@artidoro artidoro deleted the idataloader branch March 13, 2019 17:55
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants