Skip to content

swapping the order or arguments on the constructors of the ConversionExtensionsCatalog. Internalizing the constructors. #2118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions docs/samples/Microsoft.ML.Samples/Dynamic/KeyToValue_Term.cs
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ public static void KeyToValue_Term()
{
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
var ml = new MLContext();
var mlContext = new MLContext();

// Get a small dataset as an IEnumerable.
IEnumerable<SamplesUtils.DatasetUtils.SampleTopicsData> data = SamplesUtils.DatasetUtils.GetTopicsData();
var trainData = ml.CreateStreamingDataView(data);
var trainData = mlContext.CreateStreamingDataView(data);

// Preview of one of the columns of the the topics data.
// The Review column contains the keys associated with a particular body of text.
Expand All @@ -31,16 +31,16 @@ public static void KeyToValue_Term()
// making use of default settings.
string defaultColumnName = "DefaultKeys";
// REVIEW create through the catalog extension
var default_pipeline = new WordTokenizingEstimator(ml, "Review")
.Append(new ValueToKeyMappingEstimator(ml, "Review", defaultColumnName));
var default_pipeline = mlContext.Transforms.Text.TokenizeWords("Review")
Copy link
Contributor

@TomFinley TomFinley Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_pipeline [](start = 16, length = 16)

Ugh. Our samples have been written with Python variable naming conventions, not C# ones? I'll have to file an issue about this... #2155. #WontFix

.Append(mlContext.Transforms.Conversion.MapValueToKey(defaultColumnName, "Review"));

// Another pipeline, that customizes the advanced settings of the TermEstimator.
// We can change the maxNumTerm to limit how many keys will get generated out of the set of words,
// and condition the order in which they get evaluated by changing sort from the default Occurence (order in which they get encountered)
// to value/alphabetically.
string customizedColumnName = "CustomizedKeys";
var customized_pipeline = new WordTokenizingEstimator(ml, "Review")
.Append(new ValueToKeyMappingEstimator(ml, "Review", customizedColumnName, maxNumTerms: 10, sort: ValueToKeyMappingTransformer.SortOrder.Value));
var customized_pipeline = mlContext.Transforms.Text.TokenizeWords("Review")
.Append(mlContext.Transforms.Conversion.MapValueToKey(customizedColumnName, "Review", maxNumTerms: 10, sort: ValueToKeyMappingTransformer.SortOrder.Value));

// The transformed data.
var transformedData_default = default_pipeline.Fit(trainData).Transform(trainData);
Expand All @@ -61,7 +61,7 @@ public static void KeyToValue_Term()
};

// Preview of the DefaultKeys column obtained after processing the input.
var defaultColumn = transformedData_default.GetColumn<VBuffer<uint>>(ml, defaultColumnName);
var defaultColumn = transformedData_default.GetColumn<VBuffer<uint>>(mlContext, defaultColumnName);
printHelper(defaultColumnName, defaultColumn);

// DefaultKeys column obtained post-transformation.
Expand All @@ -72,7 +72,7 @@ public static void KeyToValue_Term()
// 9 10 11 12 13 6

// Previewing the CustomizedKeys column obtained after processing the input.
var customizedColumn = transformedData_customized.GetColumn<VBuffer<uint>>(ml, customizedColumnName);
var customizedColumn = transformedData_customized.GetColumn<VBuffer<uint>>(mlContext, customizedColumnName);
printHelper(customizedColumnName, customizedColumn);

// CustomizedKeys column obtained post-transformation.
Expand All @@ -84,11 +84,11 @@ public static void KeyToValue_Term()

// Retrieve the original values, by appending the KeyToValue etimator to the existing pipelines
// to convert the keys back to the strings.
var pipeline = default_pipeline.Append(new KeyToValueMappingEstimator(ml, defaultColumnName));
var pipeline = default_pipeline.Append(new KeyToValueMappingEstimator(mlContext, defaultColumnName));
transformedData_default = pipeline.Fit(trainData).Transform(trainData);

// Preview of the DefaultColumnName column obtained.
var originalColumnBack = transformedData_default.GetColumn<VBuffer<ReadOnlyMemory<char>>>(ml, defaultColumnName);
var originalColumnBack = transformedData_default.GetColumn<VBuffer<ReadOnlyMemory<char>>>(mlContext, defaultColumnName);

foreach (var row in originalColumnBack)
{
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.Data/Commands/CrossValidationCommand.cs
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@ private string GetSplitColumn(IChannel ch, IDataView input, ref IDataView output
int inc = 0;
while (input.Schema.TryGetColumnIndex(stratificationColumn, out tmp))
stratificationColumn = string.Format("{0}_{1:000}", origStratCol, ++inc);
output = new HashingEstimator(Host, origStratCol, stratificationColumn, 30).Fit(input).Transform(input);
output = new HashingEstimator(Host, stratificationColumn, origStratCol, 30).Fit(input).Transform(input);
}
}

Expand Down
6 changes: 6 additions & 0 deletions src/Microsoft.ML.Data/Properties/AssemblyInfo.cs
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
using System.Runtime.CompilerServices;
using Microsoft.ML;

[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.Benchmarks" + PublicKey.TestValue)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.TestFramework" + PublicKey.TestValue)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.Tests" + PublicKey.TestValue)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.Core.Tests" + PublicKey.TestValue)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.CpuMath.PerformanceTests" + PublicKey.TestValue)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.InferenceTesting" + PublicKey.TestValue)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.OnnxTransformTest" + PublicKey.TestValue)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.Predictor.Tests" + PublicKey.TestValue)]
Expand All @@ -20,6 +22,10 @@
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.Api" + PublicKey.Value)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.Ensemble" + PublicKey.Value)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.FastTree" + PublicKey.Value)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.DnnImageFeaturizer.AlexNet" + PublicKey.Value)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.DnnImageFeaturizer.ResNet101" + PublicKey.Value)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.DnnImageFeaturizer.ResNet18" + PublicKey.Value)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.DnnImageFeaturizer.ResNet50" + PublicKey.Value)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.HalLearners" + PublicKey.Value)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.KMeansClustering" + PublicKey.Value)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.LightGBM" + PublicKey.Value)]
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.Data/TrainContext.cs
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ private void EnsureStratificationColumn(ref IDataView data, ref string stratific
// Generate a new column with the hashed stratification column.
while (data.Schema.TryGetColumnIndex(stratificationColumn, out tmp))
stratificationColumn = string.Format("{0}_{1:000}", origStratCol, ++inc);
data = new HashingEstimator(Host, origStratCol, stratificationColumn, 30).Fit(data).Transform(data);
data = new HashingEstimator(Host, stratificationColumn, origStratCol, 30).Fit(data).Transform(data);
}
}
}
Expand Down
57 changes: 21 additions & 36 deletions src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,16 @@ public static class ConversionsExtensionsCatalog
/// Hashes the values in the input column.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="inputColumn">Name of the input column.</param>
/// <param name="outputColumn">Name of the column to be transformed. If this is null '<paramref name="inputColumn"/>' will be used.</param>
/// <param name="name">Name of the column resulting from the transformation of <paramref name="source"/>.</param>
/// <param name="source">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="name"/> will be used as source.</param>
/// <param name="hashBits">Number of bits to hash into. Must be between 1 and 31, inclusive.</param>
/// <param name="invertHash">During hashing we constuct mappings between original values and the produced hash values.
/// Text representation of original values are stored in the slot names of the metadata for the new column.Hashing, as such, can map many initial values to one.
/// <paramref name="invertHash"/> specifies the upper bound of the number of distinct input values mapping to a hash that should be retained.
/// <value>0</value> does not retain any input values. <value>-1</value> retains all input values mapping to each hash.</param>
public static HashingEstimator Hash(this TransformsCatalog.ConversionTransforms catalog, string inputColumn, string outputColumn = null,
public static HashingEstimator Hash(this TransformsCatalog.ConversionTransforms catalog, string name, string source = null,
int hashBits = HashDefaults.HashBits, int invertHash = HashDefaults.InvertHash)
=> new HashingEstimator(CatalogUtils.GetEnvironment(catalog), inputColumn, outputColumn, hashBits, invertHash);
=> new HashingEstimator(CatalogUtils.GetEnvironment(catalog), name, source ?? name, hashBits, invertHash);

/// <summary>
/// Hashes the values in the input column.
Expand All @@ -43,12 +43,12 @@ public static HashingEstimator Hash(this TransformsCatalog.ConversionTransforms
/// Changes column type of the input column.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="inputColumn">Name of the input column.</param>
/// <param name="outputColumn">Name of the column to be transformed. If this is null '<paramref name="inputColumn"/>' will be used.</param>
/// <param name="name">Name of the column resulting from the transformation of <paramref name="source"/>.</param>
/// <param name="source">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="name"/> will be used as source.</param>
/// <param name="outputKind">Number of bits to hash into. Must be between 1 and 31, inclusive.</param>
public static TypeConvertingEstimator ConvertType(this TransformsCatalog.ConversionTransforms catalog, string inputColumn, string outputColumn = null,
public static TypeConvertingEstimator ConvertType(this TransformsCatalog.ConversionTransforms catalog, string name, string source = null,
DataKind outputKind = ConvertDefaults.DefaultOutputKind)
=> new TypeConvertingEstimator(CatalogUtils.GetEnvironment(catalog), inputColumn, outputColumn, outputKind);
=> new TypeConvertingEstimator(CatalogUtils.GetEnvironment(catalog), name, source, outputKind);

/// <summary>
/// Changes column type of the input column.
Expand All @@ -62,9 +62,9 @@ public static TypeConvertingEstimator ConvertType(this TransformsCatalog.Convers
/// Convert the key types back to their original values.
/// </summary>
/// <param name="catalog">The categorical transform's catalog.</param>
/// <param name="inputColumn">Name of the input column.</param>
public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.ConversionTransforms catalog, string inputColumn)
=> new KeyToValueMappingEstimator(CatalogUtils.GetEnvironment(catalog), inputColumn);
/// <param name="source">Name of the input column.</param>
public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.ConversionTransforms catalog, string source)
=> new KeyToValueMappingEstimator(CatalogUtils.GetEnvironment(catalog), source);

/// <summary>
/// Convert the key types (name of the column specified in the first item of the tuple) back to their original values
Expand All @@ -88,43 +88,28 @@ public static KeyToVectorMappingEstimator MapKeyToVector(this TransformsCatalog.
/// Convert the key types back to their original vectors.
/// </summary>
/// <param name="catalog">The categorical transform's catalog.</param>
/// <param name="inputColumn">The name of the input column.</param>
/// <param name="outputColumn">The name of the output column.</param>
/// <param name="name">Name of the column resulting from the transformation of <paramref name="source"/>.</param>
/// <param name="source">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="name"/> will be used as source.</param>
/// <param name="bag">Whether bagging is used for the conversion. </param>
public static KeyToVectorMappingEstimator MapKeyToVector(this TransformsCatalog.ConversionTransforms catalog,
string inputColumn, string outputColumn = null, bool bag = KeyToVectorMappingEstimator.Defaults.Bag)
=> new KeyToVectorMappingEstimator(CatalogUtils.GetEnvironment(catalog), inputColumn, outputColumn, bag);
string name, string source = null, bool bag = KeyToVectorMappingEstimator.Defaults.Bag)
=> new KeyToVectorMappingEstimator(CatalogUtils.GetEnvironment(catalog), name, source ?? name, bag);

/// <summary>
/// Converts value types into <see cref="KeyType"/>.
/// </summary>
/// <param name="catalog">The categorical transform's catalog.</param>
/// <param name="inputColumn">Name of the column to be transformed.</param>
/// <param name="outputColumn">Name of the output column. If this is null '<paramref name="inputColumn"/>' will be used.</param>
/// <param name="name">Name of the column resulting from the transformation of <paramref name="source"/>.</param>
/// <param name="source">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="name"/> will be used as source.</param>
/// <param name="maxNumTerms">Maximum number of keys to keep per column when auto-training.</param>
/// <param name="sort">How items should be ordered when vectorized. If <see cref="ValueToKeyMappingTransformer.SortOrder.Occurrence"/> choosen they will be in the order encountered.
/// If <see cref="ValueToKeyMappingTransformer.SortOrder.Value"/>, items are sorted according to their default comparison, for example, text sorting will be case sensitive (for example, 'A' then 'Z' then 'a').</param>
public static ValueToKeyMappingEstimator MapValueToKey(this TransformsCatalog.ConversionTransforms catalog,
string inputColumn,
string outputColumn = null,
string name,
string source = null,
int maxNumTerms = ValueToKeyMappingEstimator.Defaults.MaxNumTerms,
ValueToKeyMappingTransformer.SortOrder sort = ValueToKeyMappingEstimator.Defaults.Sort)
=> new ValueToKeyMappingEstimator(CatalogUtils.GetEnvironment(catalog), inputColumn, outputColumn, maxNumTerms, sort);

/// <summary>
/// Converts value types into <see cref="KeyType"/> loading the keys to use from <paramref name="file"/>.
/// </summary>
/// <param name="catalog">The categorical transform's catalog.</param>
/// <param name="columns">The data columns to map to keys.</param>
/// <param name="file">The path of the file containing the terms.</param>
/// <param name="termsColumn"></param>
/// <param name="loaderFactory"></param>
public static ValueToKeyMappingEstimator MapValueToKey(this TransformsCatalog.ConversionTransforms catalog,
ValueToKeyMappingTransformer.ColumnInfo[] columns,
string file = null,
string termsColumn = null,
IComponentFactory<IMultiStreamSource, IDataLoader> loaderFactory = null)
=> new ValueToKeyMappingEstimator(CatalogUtils.GetEnvironment(catalog), columns, file, termsColumn, loaderFactory);
=> new ValueToKeyMappingEstimator(CatalogUtils.GetEnvironment(catalog), name, source, maxNumTerms, sort);

/// <summary>
/// Maps specified keys to specified values
Expand All @@ -141,7 +126,7 @@ public static ValueMappingEstimator<TInputType, TOutputType> ValueMap<TInputType
this TransformsCatalog.ConversionTransforms catalog,
IEnumerable<TInputType> keys,
IEnumerable<TOutputType> values,
params (string source, string name)[] columns)
params (string inputColumn, string outputColumn)[] columns)
Copy link
Contributor

@TomFinley TomFinley Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputColumn, string outputColumn [](start = 27, length = 32)

Remember, name, source, not source, name. And definitely not input, output. :) #Pending

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think for conistency sake we want the column mapping to be at the head, even when we have to forgo the convenience of the params.


In reply to: 248079402 [](ancestors = 248079402)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this on the next PR, because the tuple is passed directly from the estimator -> transfomer-> OneToOnetransformer base class.

I agree that it is confusing to review, because i did partial work with renaming the tuple items.


In reply to: 248079667 [](ancestors = 248079667,248079402)

=> new ValueMappingEstimator<TInputType, TOutputType>(CatalogUtils.GetEnvironment(catalog), keys, values, columns);
}
}
Loading