-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Metadata fixes for the ValueMappingEstimator #2098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Values as KeyTypes: 1) The output schema for the Estimator did not contain the KeyType information in the metadata. 2) The reverse lookup of the metadata had the incorrect value. This now sets the correct metadata on the output schema and uses the value data for the reverse lookup. A test was added to confirm the changes using the KeyToValueMapping appended to a ValueMappingEstimator for the reverse lookup. Fixes dotnet#2086 Fixes dotnet#2083
…n a Host.Assert and would not execute on release build. This is changed to a Host.Check
@@ -191,18 +192,15 @@ private static ValueGetter<VBuffer<ReadOnlyMemory<char>>> GetKeyValueGetter<TKey | |||
// set of values. This is used for generating the metadata of | |||
// the column. | |||
HashSet<TValue> valueSet = new HashSet<TValue>(); | |||
HashSet<TKey> keySet = new HashSet<TKey>(); | |||
for (int i = 0; i < values.Count(); ++i) | |||
{ | |||
var v = values.ElementAt(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you replace it with foreach loop?
I found terrifying idea of fetching i-th element every time out of IEnumerable #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
var result = t.Transform(dataView); | ||
var cursor = result.GetRowCursor((col) => true); | ||
var getterD = cursor.GetGetter<ReadOnlyMemory<char>>(6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 [](start = 65, length = 1)
magic numbers, can you do result.Schema["DOutput"].Index? #Closed
var data = new[] { new TestClass() { A = "bar", B = "test", C = "notfound" } }; | ||
var dataView = ComponentCreation.CreateDataView(Env, data); | ||
|
||
IEnumerable<ReadOnlyMemory<char>> keys = new List<ReadOnlyMemory<char>>() { "foo".AsMemory(), "bar".AsMemory(), "test".AsMemory(), "wahoo".AsMemory() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IEnumerable<ReadOnlyMemory> [](start = 12, length = 33)
what's wrong with var
? #Closed
|
||
// The expected values will contain the generated key type values starting from 1. | ||
ReadOnlyMemory<char> dValue = default; | ||
getterD(ref dValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dValue [](start = 24, length = 6)
can we validate it's a "bar"? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -100,13 +100,14 @@ public override SchemaShape GetOutputSchema(SchemaShape inputSchema) | |||
var isKey = Transformer.ValueColumnType.IsKey; | |||
var columnType = (isKey) ? PrimitiveType.FromKind(DataKind.U4) : | |||
Transformer.ValueColumnType; | |||
var metadata = SchemaShape.Create(Transformer.ValueColumnMetadata.Schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var metadata = SchemaShape.Create(Transformer.ValueColumnMetadata.Schema); | |
var metadataShape = SchemaShape.Create(Transformer.ValueColumnMetadata.Schema); |
Maybe? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -100,13 +100,14 @@ public override SchemaShape GetOutputSchema(SchemaShape inputSchema) | |||
var isKey = Transformer.ValueColumnType.IsKey; | |||
var columnType = (isKey) ? PrimitiveType.FromKind(DataKind.U4) : | |||
Transformer.ValueColumnType; | |||
var metadata = SchemaShape.Create(Transformer.ValueColumnMetadata.Schema); | |||
foreach (var (Input, Output) in _columns) | |||
{ | |||
if (!inputSchema.TryFindColumn(Input, out var originalColumn)) | |||
throw Host.ExceptSchemaMismatch(nameof(inputSchema), "input", Input); | |||
|
|||
// Get the type from TOutputType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is TOutputType here? Is it not Output? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is old, I used to have the generic arguments be TInputType, TOutputType, but then changed to TKeyType and TValueType. I will update the comment.
In reply to: 247282739 [](ancestors = 247282739)
foreach (var (Input, Output) in _columns) | ||
{ | ||
if (!inputSchema.TryFindColumn(Input, out var originalColumn)) | ||
throw Host.ExceptSchemaMismatch(nameof(inputSchema), "input", Input); | ||
|
||
// Get the type from TOutputType | ||
var col = new SchemaShape.Column(Output, vectorKind, columnType, isKey, originalColumn.Metadata); | ||
var col = new SchemaShape.Column(Output, vectorKind, columnType, isKey, metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var col = new SchemaShape.Column(Output, vectorKind, columnType, isKey, metadata); | |
var col = new SchemaShape.Column(OutputColumnName, vectorKind, columnType, isKey, metadata); | |
``` #WontFix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although Name is more specific to what the object actually is, I think it goes against thinking of these as columns since you are talking about a input/output (or source/name). I would prefer to keep it as Output/Input. BTW - this will also go through another revision with #2064
In reply to: 247282892 [](ancestors = 247282892)
var values = new List<ReadOnlyMemory<char>>() { "foo1".AsMemory(), "foo2".AsMemory(), "foo1".AsMemory(), "foo3".AsMemory() }; | ||
|
||
var estimator = new ValueMappingEstimator<ReadOnlyMemory<char>, ReadOnlyMemory<char>>(Env, keys, values, true, new[] { ("A", "D"), ("B", "E"), ("C", "F") }) | ||
.Append(new KeyToValueMappingEstimator(Env, ("D","DOutput"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Append(new KeyToValueMappingEstimator(Env, ("D","DOutput"))); | |
.Append(new KeyToValueMappingEstimator(Env, ("D", "DOutput"))); |
Please format all files touched. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Generating the list of strings for the key type values, note that foo1 is duplicated as intended to test that the same index value is returned | ||
var values = new List<ReadOnlyMemory<char>>() { "foo1".AsMemory(), "foo2".AsMemory(), "foo1".AsMemory(), "foo3".AsMemory() }; | ||
|
||
var estimator = new ValueMappingEstimator<ReadOnlyMemory<char>, ReadOnlyMemory<char>>(Env, keys, values, true, new[] { ("A", "D"), ("B", "E"), ("C", "F") }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need three columns? Only "D" is used below. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True - that is a result of copying code from another test. I have removed the other two columns.
In reply to: 247285144 [](ancestors = 247285144)
The ValueMappingEstimator had a couple of issues when using the Values as KeyTypes:
information in the metadata.
This now sets the correct metadata on the output schema and uses the
value data for the reverse lookup. A test was added to confirm the
changes using the KeyToValueMapping appended to a ValueMappingEstimator
for the reverse lookup.
Fixes #2086
Fixes #2083