Skip to content

Samples for categorical transform estimators #3179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 4, 2019

Conversation

abgoswam
Copy link
Member

@abgoswam abgoswam commented Apr 2, 2019

Towards #1209

The PR makes the following changes

  • Adds sample for the OneHotHashEncoding transform estimator.
  • Updated sample for the OneHotEncoding transform estimator.

@codecov
Copy link

codecov bot commented Apr 2, 2019

Codecov Report

Merging #3179 into master will increase coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3179      +/-   ##
==========================================
+ Coverage   72.54%   72.58%   +0.04%     
==========================================
  Files         807      807              
  Lines      144774   144956     +182     
  Branches    16208    16212       +4     
==========================================
+ Hits       105021   105212     +191     
+ Misses      35339    35326      -13     
- Partials     4414     4418       +4
Flag Coverage Δ
#Debug 72.58% <ø> (+0.04%) ⬆️
#production 68.14% <ø> (+0.01%) ⬆️
#test 88.88% <ø> (+0.05%) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Transforms/CategoricalCatalog.cs 68.42% <ø> (ø) ⬆️
src/Microsoft.ML.DataView/KeyDataViewType.cs 74.57% <0%> (-3.76%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.26% <0%> (-0.63%) ⬇️
test/Microsoft.ML.Tests/ImagesTests.cs 98.69% <0%> (-0.13%) ⬇️
...Microsoft.ML.Tests/Transformers/NormalizerTests.cs 100% <0%> (ø) ⬆️
...ML.Data/Transforms/ConversionsExtensionsCatalog.cs 44.87% <0%> (ø) ⬆️
src/Microsoft.ML.Transforms/Text/TextCatalog.cs 41.66% <0%> (ø) ⬆️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.9% <0%> (+0.2%) ⬆️
...rosoft.ML.ImageAnalytics/VectorToImageTransform.cs 76.77% <0%> (+4.53%) ⬆️
... and 3 more


// A pipeline for one hot encoding the Education column.
var bagPipeline = mlContext.Transforms.Categorical.OneHotEncoding("EducationOneHotEncoded", "Education", OutputKind.Bag);
Copy link
Member

@sfilipi sfilipi Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

", OutputKind.Bag); [](start = 114, length = 19)

I would leave it, so that it makes sense why we call it bagPipeline. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am using the default (which is uses Indicator , not bagging) .. also renamed it to pipeline


In reply to: 271811206 [](ancestors = 271811206)

// 1 0 0 0 1
// 0 1 0 1 0
// 0 1 0 0 1
// 0 0 1 1 0
}
Copy link
Member

@sfilipi sfilipi Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it a separate example, because the multi-output is a different API. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made separate example for multi input


In reply to: 271811670 [](ancestors = 271811670)

// 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
// 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
// 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
}
Copy link
Member

@sfilipi sfilipi Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate example. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created separate example for multi column


In reply to: 271811872 [](ancestors = 271811872)

private class DataPoint
{
public float Label { get; set; }

public string Education { get; set; }

public string ZipCode { get; set; }
Copy link

@shmoradims shmoradims Apr 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public string ZipCode { get; set; } [](start = 11, length = 36)

please remove since it's not used here #Resolved


public string Education { get; set; }

public string ZipCode { get; set; }
Copy link

@shmoradims shmoradims Apr 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public string ZipCode { get; set; } [](start = 11, length = 36)

please remove since it's not used here #Resolved

Console.Write($"{row[i]}\t");
Console.WriteLine();
}
}
private class DataPoint
{
public float Label { get; set; }
Copy link

@shmoradims shmoradims Apr 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public float Label { get; set; } [](start = 12, length = 32)

please remove #Resolved

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -1,6 +1,5 @@
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML.Data;
using static Microsoft.ML.Transforms.OneHotEncodingEstimator;
Copy link

@shmoradims shmoradims Apr 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for? can we remove it?
#Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for the OutputKind.Key parameter that we use in the example below


In reply to: 272292647 [](ancestors = 272292647)

@shmoradims
Copy link

shmoradims commented Apr 4, 2019

namespace Microsoft.ML.Samples.Dynamic

let's also drop Microsoft.ML prefix while we're at it. #Resolved


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Categorical/OneHotEncoding.cs:6 in 4cd6397. [](commit_id = 4cd6397, deletion_comment = False)

@abgoswam abgoswam requested a review from rogancarr April 4, 2019 21:27
Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@abgoswam abgoswam merged commit 9d79ab3 into dotnet:master Apr 4, 2019
abgoswam added a commit to abgoswam/machinelearning that referenced this pull request Apr 5, 2019
* categorical transform estimators

* review comments

* fix review comments

* modify samples namespace
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants