Skip to content

Fix bug in TextLoader #3056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 22, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 3 additions & 6 deletions src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs
Original file line number Diff line number Diff line change
@@ -1288,12 +1288,9 @@ private static bool TryParseSchema(IHost host, IMultiStreamSource files,
ch.Assert(h.Loader == null || h.Loader is ICommandLineComponentFactory);
var loader = h.Loader as ICommandLineComponentFactory;

if (loader == null || string.IsNullOrWhiteSpace(loader.Name))
goto LDone;

// Make sure the loader binds to us.
var info = host.ComponentCatalog.GetLoadableClassInfo<SignatureDataLoader>(loader.Name);
if (info.Type != typeof(ILegacyDataLoader) || info.ArgType != typeof(Options))
// Make sure that the schema is described using either the syntax TextLoader{<settings>} or the syntax Text{<settings>},
// where "settings" is a string that can be parsed by CmdParser into an object of type TextLoader.Options.
if (loader == null || string.IsNullOrWhiteSpace(loader.Name) || (loader.Name != LoaderSignature && loader.Name != "Text"))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name [](start = 88, length = 4)

The previous check was to verify that loader.Name is one of the names defined in TextLoader's LoadableClassAttribute. Since now we cannot rely on this attribute being in the ComponentCatalog, we explicitly require loader.Name to be one of the load names of TextLoader.

goto LDone;

var optionsNew = new Options();
13 changes: 13 additions & 0 deletions test/Microsoft.ML.Tests/TextLoaderTests.cs
Original file line number Diff line number Diff line change
@@ -598,6 +598,19 @@ public void ThrowsExceptionWithPropertyName()
catch (NullReferenceException) { };
}

[Fact]
public void ParseSchemaFromTextFile()
{
var mlContext = new MLContext(seed: 1);
var fileName = GetDataPath(TestDatasets.adult.trainFilename);
var loader = mlContext.Data.CreateTextLoader(new TextLoader.Options(), new MultiFileSource(fileName));
var data = loader.Load(new MultiFileSource(fileName));
Assert.NotNull(data.Schema.GetColumnOrNull("Label"));
Assert.NotNull(data.Schema.GetColumnOrNull("Workclass"));
Assert.NotNull(data.Schema.GetColumnOrNull("Categories"));
Assert.NotNull(data.Schema.GetColumnOrNull("NumericFeatures"));
}

public class QuoteInput
{
[LoadColumn(0)]