Skip to content

Convert categorical hash to estimator #1033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Oct 3, 2018

Conversation

Ivanidzo4ka
Copy link
Contributor

Convert Cathash to estimator

@@ -116,7 +116,7 @@ public bool TryUnparse(StringBuilder sb)
}
}

public sealed class ColumnInfo
Copy link
Contributor

@Zruty0 Zruty0 Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sealed [](start = 15, length = 6)

If it's not sealed, it should be abstract #Resolved

OutputKind = outputKind
};
return Create(env, args, input);
return new CategoricalHashEstimator(env, name, source, outputKind).Fit(input).Transform(input) as IDataView;
}

public static IDataTransform Create(IHostEnvironment env, Arguments args, IDataView input)
Copy link
Contributor

@Zruty0 Zruty0 Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

can we make this private? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call it in entrypoint, so internal


In reply to: 220398909 [](ancestors = 220398909)

public CategoricalHashTransform(HashEstimator hash, IEstimator<ITransformer> keyToVector, IDataView input)
{
var chain = hash.Append(keyToVector);
_transformer = chain.Fit(input);
Copy link
Contributor

@Zruty0 Zruty0 Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code belongs to the caller I think. I don't feel comfortable having a public ctor for transformer that takes estimators as args. Also, there should not be a public ctor for the transformer, if it's traiunable. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not getting your message, but I can make constructor internal.


In reply to: 220399096 [](ancestors = 220399096)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good enough.


In reply to: 221013014 [](ancestors = 221013014,220399096)

public bool IsRowToRowMapper => _transformer.IsRowToRowMapper;

public IRowToRowMapper GetRowToRowMapper(ISchema inputSchema)
{
Copy link
Contributor

@Zruty0 Zruty0 Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ [](start = 8, length = 1)

=> as well? It should check #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what?


In reply to: 220399209 [](ancestors = 220399209)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the inner method will check for the schema anyway
so you can use => here


In reply to: 221013063 [](ancestors = 221013063,220399209)

}
}

public sealed class CategoricalHashEstimator : IEstimator<CategoricalHashTransform>
Copy link
Contributor

@Zruty0 Zruty0 Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 4, length = 6)

comment #Closed


public sealed class CategoricalHashEstimator : IEstimator<CategoricalHashTransform>
{
public static class Defaults
Copy link
Contributor

@Zruty0 Zruty0 Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

internal? #Closed

private static IDataTransform CreateTransformCore(CategoricalTransform.OutputKind argsOutputKind, OneToOneColumn[] columns,
List<CategoricalTransform.OutputKind?> columnOutputKinds, IDataTransform input, IHost h, Arguments catHashArgs = null)
private readonly IHost _host;
private readonly IEstimator<ITransformer> _keyToSomething;
Copy link
Contributor

@Zruty0 Zruty0 Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_keyToSomething [](start = 50, length = 15)

let's call it 'toVector' :) #Closed

@Zruty0
Copy link
Contributor

Zruty0 commented Sep 26, 2018

                        binaryEncoding = true;

the logic is 'if ANY output is binary, then ALL is binary', right? Doesn't sound right to me #Closed


Refers to: src/Microsoft.ML.Transforms/CategoricalHashTransform.cs:259 in cc70940. [](commit_id = cc70940, deletion_comment = False)

@Zruty0
Copy link
Contributor

Zruty0 commented Sep 26, 2018

                        binaryEncoding = true;

I know that it's how it used to work, but I think we should do better than this. Either have 2 estimators for different sets of columns, or throw.


In reply to: 424549622 [](ancestors = 424549622)


Refers to: src/Microsoft.ML.Transforms/CategoricalHashTransform.cs:259 in cc70940. [](commit_id = cc70940, deletion_comment = False)

}

[Fact]
public void CategoricalHashStatic()
Copy link
Contributor

@Zruty0 Zruty0 Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CategoricalHashStatic [](start = 20, length = 21)

move to static pipeline tests assembly #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create separate issue, since we have bunch (probably most of them) in Tests now?


In reply to: 220399916 [](ancestors = 220399916)

var data = new[] {
new TestMeta() { A=new string[2] { "A", "B"}, B="C", C= new float[2] { 1.0f,2.0f}, D = 1.0f , E= new string[2]{ "A","D"}, F="D"},
new TestMeta() { A=new string[2] { "A", "B"}, B="C",C=new float[2] { 3.0f,4.0f}, D = -1.0f ,E= new string[2]{"E", "A"}, F="E"},
new TestMeta() { A=new string[2] { "A", "B"}, B="C",C=new float[2] { 5.0f,6.0f}, D = 1.0f ,E= new string[2]{ "D", "E"}, F="D"} };
Copy link
Contributor

@Zruty0 Zruty0 Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F="D" [](start = 136, length = 5)

spacing #Closed

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Ivanidzo4ka Ivanidzo4ka requested a review from sfilipi September 28, 2018 18:10
using (var ch = Env.Start("save"))
{
var saver = new TextSaver(Env, new TextSaver.Arguments { Silent = true });
IDataView savedData = TakeFilter.Create(Env, est.Fit(data).Transform(data).AsDynamic, 4);
Copy link
Member

@sfilipi sfilipi Sep 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDataView [](start = 16, length = 9)

var #Resolved

@@ -171,16 +171,14 @@ public CategoricalTransform(TermEstimator term, IEstimator<ITransformer> keyToVe

public bool IsRowToRowMapper => _transformer.IsRowToRowMapper;

public IRowToRowMapper GetRowToRowMapper(ISchema inputSchema)
{
Contracts.CheckValue(inputSchema, nameof(inputSchema));
Copy link
Member

@sfilipi sfilipi Sep 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrac [](start = 11, length = 8)

did you intend to omit the check? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @Zruty0 pointed out we run check inside _transformer.GetRowToRowMapper


In reply to: 221369314 [](ancestors = 221369314)

@@ -455,13 +453,13 @@ private sealed class Rec : EstimatorReconciler
}

/// <summary>
/// Converts the categorical value into an indicator array by building a dictionary of categories based on the data and using the id in the dictionary as the index in the array
/// Converts the categorical value into an indicator array by building a dictionary of categories based on the data and using the id in the dictionary as the index in the array.
/// </summary>
/// <param name="input">Incoming data.</param>
/// <param name="outputKind">Specify output type of indicator array: array or binary encoded data.</param>
/// <param name="order">How Id for each value would be assigined: by occurrence or by value.</param>
Copy link
Member

@sfilipi sfilipi Sep 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How Id [](start = 31, length = 7)

think it would read better: "How the Id" #Resolved

@@ -455,13 +453,13 @@ private sealed class Rec : EstimatorReconciler
}

/// <summary>
/// Converts the categorical value into an indicator array by building a dictionary of categories based on the data and using the id in the dictionary as the index in the array
/// Converts the categorical value into an indicator array by building a dictionary of categories based on the data and using the id in the dictionary as the index in the array.
/// </summary>
/// <param name="input">Incoming data.</param>
/// <param name="outputKind">Specify output type of indicator array: array or binary encoded data.</param>
Copy link
Member

@sfilipi sfilipi Sep 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify output type [](start = 37, length = 19)

think it would read better: "Specify the output" #Resolved

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

col.Source = column.Name;
col.Bag = bag;
cols.Add(col);
if (kind == CategoricalTransform.OutputKind.Bin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if [](start = 20, length = 2)

fold the if into the switch above?

Ivan Matantsev added 2 commits October 2, 2018 18:02
@Ivanidzo4ka Ivanidzo4ka merged commit c6c0d22 into dotnet:master Oct 3, 2018
@Zruty0 Zruty0 mentioned this pull request Oct 4, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants