-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix a value-mapping bug #3180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a value-mapping bug #3180
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3180 +/- ##
==========================================
+ Coverage 72.54% 72.6% +0.06%
==========================================
Files 807 807
Lines 144774 145077 +303
Branches 16208 16213 +5
==========================================
+ Hits 105021 105335 +314
+ Misses 35339 35324 -15
- Partials 4414 4418 +4
|
/// <param name="keyColumn">Name of the key column in <paramref name="lookupMap"/>.</param> | ||
/// <param name="valueColumn">Name of the value column in <paramref name="lookupMap"/>.</param> | ||
/// <param name="columns">The list of names of the input columns to apply the transformation, and the name of the resulting column.</param> | ||
internal ValueMappingEstimator(IHostEnvironment env, IDataView lookupMap, DataViewSchema.Column keyColumn, DataViewSchema.Column valueColumn, params (string outputColumnName, string inputColumnName)[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are adding this constructor, is the one above (that takes a key column name and value column name) needed? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, especially since the extension that calls into the ctor, takes in DataViewSchema.Column and passes down only the name of the column.
In reply to: 271533565 [](ancestors = 271533565)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var valueType = dataView.Schema[valueIdx].Type; | ||
var valueMap = ValueMap.Create(keyType, valueType, ValueColumnMetadata); | ||
using (var cursor = dataView.GetRowCursor(dataView.Schema[keyIdx], dataView.Schema[valueIdx])) | ||
var retrievedKeyColumn = dataView.Schema[keyColumn.Index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema [](start = 46, length = 6)
Can this be out of range? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Host.Check(retrievedKeyColumn.Type == keyColumn.Type, nameof(keyColumn) + "'s column type doesn't match that of the associated column in " + nameof(dataView)); | ||
Host.Check(retrievedKeyColumn.Annotations == keyColumn.Annotations, nameof(keyColumn) + "'s column annotions don't match those of the associated column in " + nameof(dataView)); | ||
|
||
var retrievedValueColumn = dataView.Schema[valueColumn.Index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private ValueMap _valueMap; | ||
private readonly byte[] _dataView; | ||
|
||
private readonly DataViewSchema.Column _lookupKeyColumn; | ||
private readonly DataViewSchema.Column _lookupValueColumn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private [](start = 8, length = 7)
Why store these? They look like they can be removed. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -46,7 +46,22 @@ public class ValueMappingEstimator : TrivialEstimator<ValueMappingTransformer> | |||
/// <param name="columns">The list of names of the input columns to apply the transformation, and the name of the resulting column.</param> | |||
internal ValueMappingEstimator(IHostEnvironment env, IDataView lookupMap, string keyColumn, string valueColumn, params (string outputColumnName, string inputColumnName)[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueMappingEstimator [](start = 17, length = 21)
I'm not getting why we still have this constructor
Public interface operates over Schema.Column object, any conversion from string to Schema.Column is unrelaable, why it's still around? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: base(Contracts.CheckRef(env, nameof(env)).Register(nameof(ValueMappingTransformer)), columns) | ||
{ | ||
Host.CheckNonEmpty(keyColumn, nameof(keyColumn), "A key column must be specified when passing in an IDataView for the value mapping"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Host.CheckNonEmpty [](start = 11, length = 19)
maybe check lookupKeyColumn.HasValue #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging, | ||
// as well as the source of randomness. | ||
var mlContext = new MLContext(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this, the file should already have it: ML #Closed
// Getting the resulting data as an IEnumerable. | ||
var features = mlContext.Data.CreateEnumerable<TransformedData>(transformedData, reuseRowObject: false).ToList(); | ||
|
||
var expectedPrices = new float[] { 3.14f, 2000f, 1.19f, 2.17f, 33.784f }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expectedPrices [](start = 16, length = 14)
you have rawData storing those #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_lookupKeyColumn = lookupKeyColumn; | ||
_lookupValueColumn = lookupValueColumn; | ||
_valueMap = CreateValueMapFromDataView(lookupMap, _lookupKeyColumn, _lookupValueColumn); | ||
ValueColumnMetadata = lookupMap.Schema[_lookupValueColumn.Index].Annotations; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does _lookupValueColumn.Annotations work? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did a small refactorization around Annotations
and extent a test to cover it at Iteration 4.
In reply to: 271578470 [](ancestors = 271578470)
var retrievedKeyColumn = dataView.Schema[keyColumn.Index]; | ||
Host.Check(retrievedKeyColumn.Index == keyColumn.Index, nameof(keyColumn) + "'s column index doesn't match that of the associated column in " + nameof(dataView)); | ||
Host.Check(retrievedKeyColumn.Type == keyColumn.Type, nameof(keyColumn) + "'s column type doesn't match that of the associated column in " + nameof(dataView)); | ||
Host.Check(retrievedKeyColumn.Annotations == keyColumn.Annotations, nameof(keyColumn) + "'s column annotions don't match those of the associated column in " + nameof(dataView)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotions [](start = 111, length = 9)
typo #Resolved
var valueType = dataView.Schema[valueIdx].Type; | ||
var valueMap = ValueMap.Create(keyType, valueType, ValueColumnMetadata); | ||
using (var cursor = dataView.GetRowCursor(dataView.Schema[keyIdx], dataView.Schema[valueIdx])) | ||
var retrievedKeyColumn = dataView.Schema[keyColumn.Index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retrievedKeyColumn [](start = 16, length = 18)
do we have any overload for Equals on the DataViewSchema.Column? The checks below seem like they would belong in that Equals. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open a bug about it.. we should have it.
In reply to: 271966741 [](ancestors = 271966741,271578862)
@@ -753,26 +774,29 @@ private protected override void SaveModel(ModelSaveContext ctx) | |||
/// </summary> | |||
private abstract class ValueMap | |||
{ | |||
public readonly string KeyColumnName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeyColumnName [](start = 35, length = 13)
what do you think about using the entire column, rather than the name? we have a mix of storing indices, names, columns.. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge Name
and Type
into DataViewSchema.Column
and clean redundant functions up.
In reply to: 271579703 [](ancestors = 271579703)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return new ValueMappingEstimator<TInputType, TOutputType>(catalog.GetEnvironment(), lookupMap, | ||
lookupMap.Schema[ValueMappingTransformer.DefaultKeyColumnName], | ||
lookupMap.Schema[ValueMappingTransformer.DefaultValueColumnName], | ||
InputOutputColumnPair.ConvertToValueTuples(columns)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a bit of repetition here, does it make sense to make it part of the ctor? I mean create the lookup map inside the ctor, just pass the columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel columns must go with the associated IDataView
. There should be no column if its IDataView
hasn't been created.
This PR fixes #3166.