Skip to content

Fix a value-mapping bug #3180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 4, 2019
Merged

Fix a value-mapping bug #3180

merged 4 commits into from
Apr 4, 2019

Conversation

wschin
Copy link
Member

@wschin wschin commented Apr 2, 2019

This PR fixes #3166.

@wschin wschin added the bug Something isn't working label Apr 2, 2019
@wschin wschin self-assigned this Apr 2, 2019
@wschin wschin requested review from singlis and sfilipi April 2, 2019 21:43
@codecov
Copy link

codecov bot commented Apr 2, 2019

Codecov Report

Merging #3180 into master will increase coverage by 0.06%.
The diff coverage is 97.19%.

@@            Coverage Diff             @@
##           master    #3180      +/-   ##
==========================================
+ Coverage   72.54%    72.6%   +0.06%     
==========================================
  Files         807      807              
  Lines      144774   145077     +303     
  Branches    16208    16213       +5     
==========================================
+ Hits       105021   105335     +314     
+ Misses      35339    35324      -15     
- Partials     4414     4418       +4
Flag Coverage Δ
#Debug 72.6% <97.19%> (+0.06%) ⬆️
#production 68.15% <91.42%> (+0.02%) ⬆️
#test 88.92% <100%> (+0.09%) ⬆️
Impacted Files Coverage Δ
...crosoft.ML.Tests/Transformers/ValueMappingTests.cs 100% <100%> (ø) ⬆️
src/Microsoft.ML.Data/Transforms/ValueMapping.cs 84.26% <91.22%> (-0.14%) ⬇️
...ML.Data/Transforms/ConversionsExtensionsCatalog.cs 64.07% <92.3%> (+19.2%) ⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0%> (-20.52%) ⬇️
src/Microsoft.ML.DataView/KeyDataViewType.cs 74.57% <0%> (-3.76%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
test/Microsoft.ML.Tests/ImagesTests.cs 98.69% <0%> (-0.13%) ⬇️
...Microsoft.ML.Tests/Transformers/NormalizerTests.cs 100% <0%> (ø) ⬆️
src/Microsoft.ML.Transforms/Text/TextCatalog.cs 41.66% <0%> (ø) ⬆️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.26% <0%> (+0.15%) ⬆️
... and 5 more

/// <param name="keyColumn">Name of the key column in <paramref name="lookupMap"/>.</param>
/// <param name="valueColumn">Name of the value column in <paramref name="lookupMap"/>.</param>
/// <param name="columns">The list of names of the input columns to apply the transformation, and the name of the resulting column.</param>
internal ValueMappingEstimator(IHostEnvironment env, IDataView lookupMap, DataViewSchema.Column keyColumn, DataViewSchema.Column valueColumn, params (string outputColumnName, string inputColumnName)[] columns)
Copy link
Member

@singlis singlis Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are adding this constructor, is the one above (that takes a key column name and value column name) needed? #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, especially since the extension that calls into the ctor, takes in DataViewSchema.Column and passes down only the name of the column.


In reply to: 271533565 [](ancestors = 271533565)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it in the next iteration.


In reply to: 271577343 [](ancestors = 271577343,271533565)

var valueType = dataView.Schema[valueIdx].Type;
var valueMap = ValueMap.Create(keyType, valueType, ValueColumnMetadata);
using (var cursor = dataView.GetRowCursor(dataView.Schema[keyIdx], dataView.Schema[valueIdx]))
var retrievedKeyColumn = dataView.Schema[keyColumn.Index];
Copy link
Member

@singlis singlis Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema [](start = 46, length = 6)

Can this be out of range? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a check in the next iteration.


In reply to: 271535033 [](ancestors = 271535033)

Host.Check(retrievedKeyColumn.Type == keyColumn.Type, nameof(keyColumn) + "'s column type doesn't match that of the associated column in " + nameof(dataView));
Host.Check(retrievedKeyColumn.Annotations == keyColumn.Annotations, nameof(keyColumn) + "'s column annotions don't match those of the associated column in " + nameof(dataView));

var retrievedValueColumn = dataView.Schema[valueColumn.Index];
Copy link
Member

@singlis singlis Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a check as well.


In reply to: 271535097 [](ancestors = 271535097)

private ValueMap _valueMap;
private readonly byte[] _dataView;

private readonly DataViewSchema.Column _lookupKeyColumn;
private readonly DataViewSchema.Column _lookupValueColumn;
Copy link
Member

@singlis singlis Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private [](start = 8, length = 7)

Why store these? They look like they can be removed. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice catch. Thanks.


In reply to: 271535842 [](ancestors = 271535842)

Copy link
Member

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -46,7 +46,22 @@ public class ValueMappingEstimator : TrivialEstimator<ValueMappingTransformer>
/// <param name="columns">The list of names of the input columns to apply the transformation, and the name of the resulting column.</param>
internal ValueMappingEstimator(IHostEnvironment env, IDataView lookupMap, string keyColumn, string valueColumn, params (string outputColumnName, string inputColumnName)[] columns)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueMappingEstimator [](start = 17, length = 21)

I'm not getting why we still have this constructor
Public interface operates over Schema.Column object, any conversion from string to Schema.Column is unrelaable, why it's still around? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed and related code is updated as well.


In reply to: 271544302 [](ancestors = 271544302)

: base(Contracts.CheckRef(env, nameof(env)).Register(nameof(ValueMappingTransformer)), columns)
{
Host.CheckNonEmpty(keyColumn, nameof(keyColumn), "A key column must be specified when passing in an IDataView for the value mapping");
Copy link
Member

@sfilipi sfilipi Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Host.CheckNonEmpty [](start = 11, length = 19)

maybe check lookupKeyColumn.HasValue #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't. This is a struct, not a class.


In reply to: 271577590 [](ancestors = 271577590)

// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
var mlContext = new MLContext();

Copy link
Member

@sfilipi sfilipi Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this, the file should already have it: ML #Closed

// Getting the resulting data as an IEnumerable.
var features = mlContext.Data.CreateEnumerable<TransformedData>(transformedData, reuseRowObject: false).ToList();

var expectedPrices = new float[] { 3.14f, 2000f, 1.19f, 2.17f, 33.784f };
Copy link
Member

@sfilipi sfilipi Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expectedPrices [](start = 16, length = 14)

you have rawData storing those #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Thank you.


In reply to: 271578355 [](ancestors = 271578355)

_lookupKeyColumn = lookupKeyColumn;
_lookupValueColumn = lookupValueColumn;
_valueMap = CreateValueMapFromDataView(lookupMap, _lookupKeyColumn, _lookupValueColumn);
ValueColumnMetadata = lookupMap.Schema[_lookupValueColumn.Index].Annotations;

Copy link
Member

@sfilipi sfilipi Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does _lookupValueColumn.Annotations work? #Resolved

Copy link
Member Author

@wschin wschin Apr 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a small refactorization around Annotations and extent a test to cover it at Iteration 4.


In reply to: 271578470 [](ancestors = 271578470)

var retrievedKeyColumn = dataView.Schema[keyColumn.Index];
Host.Check(retrievedKeyColumn.Index == keyColumn.Index, nameof(keyColumn) + "'s column index doesn't match that of the associated column in " + nameof(dataView));
Host.Check(retrievedKeyColumn.Type == keyColumn.Type, nameof(keyColumn) + "'s column type doesn't match that of the associated column in " + nameof(dataView));
Host.Check(retrievedKeyColumn.Annotations == keyColumn.Annotations, nameof(keyColumn) + "'s column annotions don't match those of the associated column in " + nameof(dataView));
Copy link
Member

@sfilipi sfilipi Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotions [](start = 111, length = 9)

typo #Resolved

var valueType = dataView.Schema[valueIdx].Type;
var valueMap = ValueMap.Create(keyType, valueType, ValueColumnMetadata);
using (var cursor = dataView.GetRowCursor(dataView.Schema[keyIdx], dataView.Schema[valueIdx]))
var retrievedKeyColumn = dataView.Schema[keyColumn.Index];
Copy link
Member

@sfilipi sfilipi Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retrievedKeyColumn [](start = 16, length = 18)

do we have any overload for Equals on the DataViewSchema.Column? The checks below seem like they would belong in that Equals. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't.


In reply to: 271578862 [](ancestors = 271578862)

Copy link
Member

@sfilipi sfilipi Apr 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a bug about it.. we should have it.


In reply to: 271966741 [](ancestors = 271966741,271578862)

@@ -753,26 +774,29 @@ private protected override void SaveModel(ModelSaveContext ctx)
/// </summary>
private abstract class ValueMap
{
public readonly string KeyColumnName;
Copy link
Member

@sfilipi sfilipi Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyColumnName [](start = 35, length = 13)

what do you think about using the entire column, rather than the name? we have a mix of storing indices, names, columns.. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge Name and Type into DataViewSchema.Column and clean redundant functions up.


In reply to: 271579703 [](ancestors = 271579703)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

return new ValueMappingEstimator<TInputType, TOutputType>(catalog.GetEnvironment(), lookupMap,
lookupMap.Schema[ValueMappingTransformer.DefaultKeyColumnName],
lookupMap.Schema[ValueMappingTransformer.DefaultValueColumnName],
InputOutputColumnPair.ConvertToValueTuples(columns));
Copy link
Member

@sfilipi sfilipi Apr 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a bit of repetition here, does it make sense to make it part of the ctor? I mean create the lookup map inside the ctor, just pass the columns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel columns must go with the associated IDataView. There should be no column if its IDataView hasn't been created.

@wschin wschin merged commit b8a70ac into dotnet:master Apr 4, 2019
@wschin wschin deleted the fix-value-map branch April 4, 2019 22:28
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
4 participants