Skip to content

Added documentation regarding TextLoader's hasHeader field #4655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 7, 2020
Merged
6 changes: 5 additions & 1 deletion src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs
Original file line number Diff line number Diff line change
Expand Up @@ -479,6 +479,10 @@ public class Options

/// <summary>
/// Whether the data file has a header with feature names.
/// Note: If a TextLoader is created with hasHeader = true but without a dataSample, then vector columns made by TextLoader will not contain slot name
/// annotations (slots being the elements of the given vector column), because the output schema is made when the TextLoader is made, and not when
/// TextLoader.Load(IMultiStreamSource source) is called. In addition, the case where dataSample = null and hasHeader = true indicates to the
/// loader that when it is given a file when <see cref="TextLoader.Load(IMultiStreamSource)"/> is called, it needs to skip the first line.
/// </summary>
[Argument(ArgumentType.AtMostOnce, ShortName = "header",
HelpText = "Data file has header with feature names. Header is read only if options 'hs' and 'hf' are not specified.")]
Expand Down Expand Up @@ -1557,4 +1561,4 @@ public DataViewRowCursor[] GetRowCursorSet(IEnumerable<DataViewSchema.Column> co
void ICanSaveModel.Save(ModelSaveContext ctx) => ((ICanSaveModel)_loader).Save(ctx);
}
}
}
}
24 changes: 20 additions & 4 deletions src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderSaverCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,11 @@ public static class TextLoaderSaverCatalog
/// <param name="catalog">The <see cref="DataOperationsCatalog"/> catalog.</param>
/// <param name="columns">Array of columns <see cref="TextLoader.Column"/> defining the schema.</param>
/// <param name="separatorChar">The character used as separator between data points in a row. By default the tab character is used as separator.</param>
/// <param name="hasHeader">Whether the file has a header.</param>
/// <param name="hasHeader">Whether the file has a header with feature names. Note: If a TextLoader is created with hasHeader = true but without a
/// <paramref name="dataSample"/>, then vector columns made by TextLoader will not contain slot name annotations (slots being the elements of the given vector column),
/// because the output schema is made when the TextLoader is made, and not when <see cref="TextLoader.Load(IMultiStreamSource)"/> is called.
/// In addition, the case where dataSample = null and hasHeader = true indicates to the loader that when it is given a file when Load()
/// is called, it needs to skip the first line.</param>
/// <param name="dataSample">The optional location of a data sample. The sample can be used to infer column names and number of slots in each column.</param>
/// <param name="allowQuoting">Whether the file can contain columns defined by a quoted string.</param>
/// <param name="trimWhitespace">Remove trailing whitespace from lines</param>
Expand Down Expand Up @@ -67,7 +71,11 @@ public static TextLoader CreateTextLoader(this DataOperationsCatalog catalog,
/// names and their data types in the schema of the loaded data.</typeparam>
/// <param name="catalog">The <see cref="DataOperationsCatalog"/> catalog.</param>
/// <param name="separatorChar">Column separator character. Default is '\t'</param>
/// <param name="hasHeader">Does the file contains header?</param>
/// <param name="hasHeader">Whether the file has a header with feature names. Note: If a TextLoader is created with hasHeader = true but without a
/// <paramref name="dataSample"/>, then vector columns made by TextLoader will not contain slot name annotations (slots being the elements of the given vector column),
/// because the output schema is made when the TextLoader is made, and not when <see cref="TextLoader.Load(IMultiStreamSource)"/> is called.
/// In addition, the case where dataSample = null and hasHeader = true indicates to the loader that when it is given a file when Load()
/// is called, it needs to skip the first line.</param>
/// <param name="dataSample">The optional location of a data sample. The sample can be used to infer information
/// about the columns, such as slot names.</param>
/// <param name="allowQuoting">Whether the input may include quoted values,
Expand Down Expand Up @@ -97,7 +105,11 @@ public static TextLoader CreateTextLoader<TInput>(this DataOperationsCatalog cat
/// <param name="path">The path to the file.</param>
/// <param name="columns">The columns of the schema.</param>
/// <param name="separatorChar">The character used as separator between data points in a row. By default the tab character is used as separator.</param>
/// <param name="hasHeader">Whether the file has a header.</param>
/// <param name="hasHeader">Whether the file has a header with feature names. Note: If a TextLoader is created with hasHeader = true but without a
/// dataSample, then vector columns made by TextLoader will not contain slot name annotations (slots being the elements of the given vector column),
/// because the output schema is made when the TextLoader is made, and not when <see cref="TextLoader.Load(IMultiStreamSource)"/> is called.
/// In addition, the case where dataSample = null and hasHeader = true indicates to the loader that when it is given a file when Load()
/// is called, it needs to skip the first line.</param>
/// <param name="allowQuoting">Whether the file can contain columns defined by a quoted string.</param>
/// <param name="trimWhitespace">Remove trailing whitespace from lines</param>
/// <param name="allowSparse">Whether the file can contain numerical vectors in sparse format.</param>
Expand Down Expand Up @@ -134,7 +146,11 @@ public static IDataView LoadFromTextFile(this DataOperationsCatalog catalog,
/// <param name="catalog">The <see cref="DataOperationsCatalog"/> catalog.</param>
/// <param name="path">The path to the file.</param>
/// <param name="separatorChar">Column separator character. Default is '\t'</param>
/// <param name="hasHeader">Does the file contains header?</param>
/// <param name="hasHeader">Whether the file has a header with feature names. Note: If a TextLoader is created with hasHeader = true but without a
/// dataSample, then vector columns made by TextLoader will not contain slot name annotations (slots being the elements of the given vector column),
/// because the output schema is made when the TextLoader is made, and not when <see cref="TextLoader.Load(IMultiStreamSource)"/> is called.
/// In addition, the case where dataSample = null and hasHeader = true indicates to the loader that when it is given a file when Load()
/// is called, it needs to skip the first line.</param>
/// <param name="allowQuoting">Whether the input may include quoted values,
/// which can contain separator characters, colons,
/// and distinguish empty values from missing values. When true, consecutive separators
Expand Down