Skip to content

NAReplace estimator #917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 19, 2018
Merged

Conversation

Ivanidzo4ka
Copy link
Contributor

Converts NAReplace to estimator

@Ivanidzo4ka Ivanidzo4ka self-assigned this Sep 14, 2018
@Ivanidzo4ka Ivanidzo4ka added the API Issues pertaining the friendly API label Sep 14, 2018
@Ivanidzo4ka Ivanidzo4ka added this to the 0918 milestone Sep 14, 2018
@Zruty0 Zruty0 mentioned this pull request Sep 14, 2018
public readonly bool ImputeBySlot;
public readonly ReplacementKind Kind;

public ColumnInfo(string input, string output, ReplacementKind kind = ReplacementKind.DefaultValue, bool imputeBySlot = true)
Copy link
Contributor

@Zruty0 Zruty0 Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnInfo [](start = 19, length = 10)

summary comment #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a good comment?


In reply to: 217834730 [](ancestors = 217834730)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary: Describes how the transformer handles one column pair.
input: name of input column
output: name of output column
replacementMode: what to replace the missing value with
imputeBySlot: if true, per-slot imputation of replacement is performed. Otherwise, replacement value is imputed for the entire vector column. This setting is ignored for scalars and variable vectors, where imputation is always for the entire column.


In reply to: 217855806 [](ancestors = 217855806,217834730)

return columns.Select(x => (x.Input, x.Output)).ToArray();
}

///IVAN: move to mapper.
Copy link
Contributor

@Zruty0 Zruty0 Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to mapper. [](start = 17, length = 15)

maybe just turn into one 'type' ? The rest is accessible via _parent.ColumnPairs #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually quite tempted to move whole ColInfo into MapperBase.
Since it's heavily used almost in every mapper I wrote.

But yeah, I have _types array which I will use in Estimator.


In reply to: 217834859 [](ancestors = 217834859)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving ColInfo to MapperBase will be a bit unwieldy I think.


In reply to: 217836517 [](ancestors = 217836517,217834859)

var type = inputSchema.GetColumnType(srcCol);
string reason = TestType(type);
if (reason != null)
//IVAN: not sure about schema mismatch
Copy link
Contributor

@Zruty0 Zruty0 Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t sure about schema mismatch [](start = 26, length = 28)

no, it looks right #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look on TestType and what kind of string it returns.
No way I can convert it to current SchemaMismatch wording.


In reply to: 217835064 [](ancestors = 217835064)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yuck. Ok, then do ExceptParam


In reply to: 217836776 [](ancestors = 217836776,217835064)

var colMetaInfo = new ColumnMetadataInfo(_parent.ColumnPairs[i].output);
foreach (var type in InputSchema.GetMetadataTypes(colIndex).Where(x => x.Key == MetadataUtils.Kinds.SlotNames || x.Key == MetadataUtils.Kinds.IsNormalized))
Utils.MarshalInvoke(AddMetaGetter<int>, type.Value.RawType, colMetaInfo, InputSchema, type.Key, type.Value, colIndex);
result[i] = new RowMapperColumnInfo(_parent.ColumnPairs[i].output, _parent._types[i], colMetaInfo);
Copy link
Contributor

@Zruty0 Zruty0 Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colMetaInfo [](start = 106, length = 11)

can we just do RowColumnUtils.GetMetadataAsRow(InputSchema, colIndex, x=> x == SlotNames || x == IsNormalized) ? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the whole purpose of this 'metadata as IRow' exercise, to avoid creating such 'identity getters' for metadata


In reply to: 217836483 [](ancestors = 217836483)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -18,31 +18,32 @@
using Microsoft.ML.Runtime.Internal.Utilities;
using Microsoft.ML.Runtime.Model;
using Microsoft.ML.Runtime.Model.Onnx;
using Microsoft.ML.Core.Data;
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using Microsoft.ML.Core.Data [](start = 0, length = 28)

Why VS can't sort it properly, why, oh why? #Resolved

Copy link
Member

@sfilipi sfilipi Sep 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after re-activating resharper, the sorting problem went away.

also, are we doing system, than Microsoft.ML or vice versa in ML.NET?


In reply to: 217855717 [](ancestors = 217855717)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do what Ctrl-R-G does (which is alphabetically)


In reply to: 217875183 [](ancestors = 217875183,217855717)

@@ -20,28 +20,28 @@ namespace Microsoft.ML.Runtime.Data
/// <include file='doc.xml' path='doc/members/member[@name="NAHandle"]'/>
public static class NAHandleTransform
{
public enum ReplacementKind
public enum ReplacementKind:byte
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:byte [](start = 35, length = 5)

ctrl+k+d #Resolved

// creating output columns that are identical to the input columns except for replacing NA values
// with either the default value, user input, or imputed values (min/max/mean are currently supported).
// Imputation modes are supported for vectors both by slot and across all slots.
// REVIEW: May make sense to implement the transform template interface.
Copy link
Member

@sfilipi sfilipi Sep 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove it? why remove the doc link? #Closed

@@ -14444,7 +14444,7 @@ public MissingValuesRowDropperPipelineStep(Output output)

namespace Legacy.Transforms
{
public enum NAReplaceTransformReplacementKind
public enum NAReplaceTransformReplacementKind : byte
Copy link
Contributor

@Zruty0 Zruty0 Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: byte [](start = 53, length = 7)

fix codegen? Or is it already good? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already good


In reply to: 218248180 [](ancestors = 218248180)


var est = data.MakeNewEstimator().
Append(row => (
A: row.ScalarString.NAReplace(),
Copy link
Contributor

@Zruty0 Zruty0 Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAReplace [](start = 39, length = 9)

ReplaceMissingValues( #Resolved


[assembly: LoadableClass(typeof(NAReplaceTransform), typeof(NAReplaceTransform.Arguments), typeof(SignatureDataTransform),
NAReplaceTransform.FriendlyName, NAReplaceTransform.LoadName, "NAReplace", NAReplaceTransform.ShortName, DocName = "transform/NAHandle.md")]
[assembly: LoadableClass(NAReplaceTransform.Summary, typeof(IDataView), typeof(NAReplaceTransform), null, typeof(SignatureLoadDataTransform),
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Sep 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDataView [](start = 60, length = 9)

IDataTransform #Resolved

@Ivanidzo4ka Ivanidzo4ka changed the title WIP NAReplace estimator NAReplace estimator Sep 18, 2018
@Ivanidzo4ka Ivanidzo4ka reopened this Sep 18, 2018
Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Ivanidzo4ka Ivanidzo4ka merged commit 731381c into dotnet:master Sep 19, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants