-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Convert categorical hash to estimator #1033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -116,7 +116,7 @@ public bool TryUnparse(StringBuilder sb) | |||
} | |||
} | |||
|
|||
public sealed class ColumnInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sealed [](start = 15, length = 6)
If it's not sealed, it should be abstract #Resolved
OutputKind = outputKind | ||
}; | ||
return Create(env, args, input); | ||
return new CategoricalHashEstimator(env, name, source, outputKind).Fit(input).Transform(input) as IDataView; | ||
} | ||
|
||
public static IDataTransform Create(IHostEnvironment env, Arguments args, IDataView input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
can we make this private? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public CategoricalHashTransform(HashEstimator hash, IEstimator<ITransformer> keyToVector, IDataView input) | ||
{ | ||
var chain = hash.Append(keyToVector); | ||
_transformer = chain.Fit(input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code belongs to the caller I think. I don't feel comfortable having a public ctor for transformer that takes estimators as args. Also, there should not be a public ctor for the transformer, if it's traiunable. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not getting your message, but I can make constructor internal.
In reply to: 220399096 [](ancestors = 220399096)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public bool IsRowToRowMapper => _transformer.IsRowToRowMapper; | ||
|
||
public IRowToRowMapper GetRowToRowMapper(ISchema inputSchema) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ [](start = 8, length = 1)
=>
as well? It should check #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the inner method will check for the schema anyway
so you can use =>
here
In reply to: 221013063 [](ancestors = 221013063,220399209)
} | ||
} | ||
|
||
public sealed class CategoricalHashEstimator : IEstimator<CategoricalHashTransform> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 4, length = 6)
comment #Closed
|
||
public sealed class CategoricalHashEstimator : IEstimator<CategoricalHashTransform> | ||
{ | ||
public static class Defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
internal? #Closed
private static IDataTransform CreateTransformCore(CategoricalTransform.OutputKind argsOutputKind, OneToOneColumn[] columns, | ||
List<CategoricalTransform.OutputKind?> columnOutputKinds, IDataTransform input, IHost h, Arguments catHashArgs = null) | ||
private readonly IHost _host; | ||
private readonly IEstimator<ITransformer> _keyToSomething; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_keyToSomething [](start = 50, length = 15)
let's call it 'toVector' :) #Closed
I know that it's how it used to work, but I think we should do better than this. Either have 2 estimators for different sets of columns, or throw. In reply to: 424549622 [](ancestors = 424549622) Refers to: src/Microsoft.ML.Transforms/CategoricalHashTransform.cs:259 in cc70940. [](commit_id = cc70940, deletion_comment = False) |
} | ||
|
||
[Fact] | ||
public void CategoricalHashStatic() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CategoricalHashStatic [](start = 20, length = 21)
move to static pipeline tests assembly #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create separate issue, since we have bunch (probably most of them) in Tests now?
In reply to: 220399916 [](ancestors = 220399916)
var data = new[] { | ||
new TestMeta() { A=new string[2] { "A", "B"}, B="C", C= new float[2] { 1.0f,2.0f}, D = 1.0f , E= new string[2]{ "A","D"}, F="D"}, | ||
new TestMeta() { A=new string[2] { "A", "B"}, B="C",C=new float[2] { 3.0f,4.0f}, D = -1.0f ,E= new string[2]{"E", "A"}, F="E"}, | ||
new TestMeta() { A=new string[2] { "A", "B"}, B="C",C=new float[2] { 5.0f,6.0f}, D = 1.0f ,E= new string[2]{ "D", "E"}, F="D"} }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F="D" [](start = 136, length = 5)
spacing #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using (var ch = Env.Start("save")) | ||
{ | ||
var saver = new TextSaver(Env, new TextSaver.Arguments { Silent = true }); | ||
IDataView savedData = TakeFilter.Create(Env, est.Fit(data).Transform(data).AsDynamic, 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDataView [](start = 16, length = 9)
var #Resolved
@@ -171,16 +171,14 @@ public CategoricalTransform(TermEstimator term, IEstimator<ITransformer> keyToVe | |||
|
|||
public bool IsRowToRowMapper => _transformer.IsRowToRowMapper; | |||
|
|||
public IRowToRowMapper GetRowToRowMapper(ISchema inputSchema) | |||
{ | |||
Contracts.CheckValue(inputSchema, nameof(inputSchema)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contrac [](start = 11, length = 8)
did you intend to omit the check? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -455,13 +453,13 @@ private sealed class Rec : EstimatorReconciler | |||
} | |||
|
|||
/// <summary> | |||
/// Converts the categorical value into an indicator array by building a dictionary of categories based on the data and using the id in the dictionary as the index in the array | |||
/// Converts the categorical value into an indicator array by building a dictionary of categories based on the data and using the id in the dictionary as the index in the array. | |||
/// </summary> | |||
/// <param name="input">Incoming data.</param> | |||
/// <param name="outputKind">Specify output type of indicator array: array or binary encoded data.</param> | |||
/// <param name="order">How Id for each value would be assigined: by occurrence or by value.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How Id [](start = 31, length = 7)
think it would read better: "How the Id" #Resolved
@@ -455,13 +453,13 @@ private sealed class Rec : EstimatorReconciler | |||
} | |||
|
|||
/// <summary> | |||
/// Converts the categorical value into an indicator array by building a dictionary of categories based on the data and using the id in the dictionary as the index in the array | |||
/// Converts the categorical value into an indicator array by building a dictionary of categories based on the data and using the id in the dictionary as the index in the array. | |||
/// </summary> | |||
/// <param name="input">Incoming data.</param> | |||
/// <param name="outputKind">Specify output type of indicator array: array or binary encoded data.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify output type [](start = 37, length = 19)
think it would read better: "Specify the output" #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col.Source = column.Name; | ||
col.Bag = bag; | ||
cols.Add(col); | ||
if (kind == CategoricalTransform.OutputKind.Bin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if [](start = 20, length = 2)
fold the if
into the switch
above?
Convert Cathash to estimator