-
Notifications
You must be signed in to change notification settings - Fork 1.9k
NAReplace estimator #917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAReplace estimator #917
Conversation
public readonly bool ImputeBySlot; | ||
public readonly ReplacementKind Kind; | ||
|
||
public ColumnInfo(string input, string output, ReplacementKind kind = ReplacementKind.DefaultValue, bool imputeBySlot = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnInfo [](start = 19, length = 10)
summary comment #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
summary: Describes how the transformer handles one column pair.
input: name of input column
output: name of output column
replacementMode: what to replace the missing value with
imputeBySlot: if true, per-slot imputation of replacement is performed. Otherwise, replacement value is imputed for the entire vector column. This setting is ignored for scalars and variable vectors, where imputation is always for the entire column.
In reply to: 217855806 [](ancestors = 217855806,217834730)
return columns.Select(x => (x.Input, x.Output)).ToArray(); | ||
} | ||
|
||
///IVAN: move to mapper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to mapper. [](start = 17, length = 15)
maybe just turn into one 'type' ? The rest is accessible via _parent.ColumnPairs
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually quite tempted to move whole ColInfo into MapperBase.
Since it's heavily used almost in every mapper I wrote.
But yeah, I have _types array which I will use in Estimator.
In reply to: 217834859 [](ancestors = 217834859)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving ColInfo
to MapperBase
will be a bit unwieldy I think.
In reply to: 217836517 [](ancestors = 217836517,217834859)
var type = inputSchema.GetColumnType(srcCol); | ||
string reason = TestType(type); | ||
if (reason != null) | ||
//IVAN: not sure about schema mismatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t sure about schema mismatch [](start = 26, length = 28)
no, it looks right #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look on TestType and what kind of string it returns.
No way I can convert it to current SchemaMismatch wording.
In reply to: 217835064 [](ancestors = 217835064)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var colMetaInfo = new ColumnMetadataInfo(_parent.ColumnPairs[i].output); | ||
foreach (var type in InputSchema.GetMetadataTypes(colIndex).Where(x => x.Key == MetadataUtils.Kinds.SlotNames || x.Key == MetadataUtils.Kinds.IsNormalized)) | ||
Utils.MarshalInvoke(AddMetaGetter<int>, type.Value.RawType, colMetaInfo, InputSchema, type.Key, type.Value, colIndex); | ||
result[i] = new RowMapperColumnInfo(_parent.ColumnPairs[i].output, _parent._types[i], colMetaInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
colMetaInfo [](start = 106, length = 11)
can we just do RowColumnUtils.GetMetadataAsRow(InputSchema, colIndex, x=> x == SlotNames || x == IsNormalized)
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the whole purpose of this 'metadata as IRow' exercise, to avoid creating such 'identity getters' for metadata
In reply to: 217836483 [](ancestors = 217836483)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -18,31 +18,32 @@ | |||
using Microsoft.ML.Runtime.Internal.Utilities; | |||
using Microsoft.ML.Runtime.Model; | |||
using Microsoft.ML.Runtime.Model.Onnx; | |||
using Microsoft.ML.Core.Data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using Microsoft.ML.Core.Data [](start = 0, length = 28)
Why VS can't sort it properly, why, oh why? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after re-activating resharper, the sorting problem went away.
also, are we doing system, than Microsoft.ML or vice versa in ML.NET?
In reply to: 217855717 [](ancestors = 217855717)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do what Ctrl-R-G does (which is alphabetically)
In reply to: 217875183 [](ancestors = 217875183,217855717)
@@ -20,28 +20,28 @@ namespace Microsoft.ML.Runtime.Data | |||
/// <include file='doc.xml' path='doc/members/member[@name="NAHandle"]'/> | |||
public static class NAHandleTransform | |||
{ | |||
public enum ReplacementKind | |||
public enum ReplacementKind:byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:byte [](start = 35, length = 5)
ctrl+k+d #Resolved
// creating output columns that are identical to the input columns except for replacing NA values | ||
// with either the default value, user input, or imputed values (min/max/mean are currently supported). | ||
// Imputation modes are supported for vectors both by slot and across all slots. | ||
// REVIEW: May make sense to implement the transform template interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove it? why remove the doc link? #Closed
@@ -14444,7 +14444,7 @@ public MissingValuesRowDropperPipelineStep(Output output) | |||
|
|||
namespace Legacy.Transforms | |||
{ | |||
public enum NAReplaceTransformReplacementKind | |||
public enum NAReplaceTransformReplacementKind : byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: byte [](start = 53, length = 7)
fix codegen? Or is it already good? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
var est = data.MakeNewEstimator(). | ||
Append(row => ( | ||
A: row.ScalarString.NAReplace(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAReplace [](start = 39, length = 9)
ReplaceMissingValues( #Resolved
|
||
[assembly: LoadableClass(typeof(NAReplaceTransform), typeof(NAReplaceTransform.Arguments), typeof(SignatureDataTransform), | ||
NAReplaceTransform.FriendlyName, NAReplaceTransform.LoadName, "NAReplace", NAReplaceTransform.ShortName, DocName = "transform/NAHandle.md")] | ||
[assembly: LoadableClass(NAReplaceTransform.Summary, typeof(IDataView), typeof(NAReplaceTransform), null, typeof(SignatureLoadDataTransform), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDataView [](start = 60, length = 9)
IDataTransform #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converts NAReplace to estimator