Skip to content

Remove the IFourierDistributionSampler interface #2698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 26, 2019

Conversation

yaeldekel
Copy link

Fixes #2659,
fixes #699.


public interface IFourierDistributionSampler : ICanSaveModel
public abstract class RngGeneratorBase
Copy link
Member

@eerhardt eerhardt Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, .NET APIs prefer to write out the names. Rng. Also I assume the g in Rng is "generator", so it is a little redundant.

https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/general-naming-conventions#using-abbreviations-and-acronyms

X DO NOT use abbreviations or contractions as part of identifier names.

For example, use GetWindow rather than GetWin.


Can we add some public XML doc to this new pubilc class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be good to tie this with Fourier as well? Since this is in the Microsoft.ML.Transforms namespace, having a RngGeneratorBase class seems a bit broad, especially one that doesn't have any public members on it.


In reply to: 259486033 [](ancestors = 259486033)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was going to mention that I'm not happy with these class names... I wrote an explanation about these classes in a separate comment, I'll also add some XML comments and post another iteration for this PR.


In reply to: 259486033 [](ancestors = 259486033)

float Next(Random rand);
internal abstract float Dist(in VBuffer<float> first, in VBuffer<float> second);

internal abstract RandomNumberGeneratorBase GetRandomNumberGenerator(float avgDist);
Copy link
Member

@eerhardt eerhardt Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between RngGeneratorBase and RandomNumberGeneratorBase?

@yaeldekel
Copy link
Author

yaeldekel commented Feb 22, 2019

The problem with IFourierDistributionSampler, as Eric mentioned in issue #699, is that creating one requires a float value (avgDist), but the way avgDist is computed, depends on the implementation of the interface. So in this PR, I am breaking up this interface into two base classes instead. The first one is used to help in the computation of avgDist, and then it is used to create an instance of the second one. The second one, has the Next API to produce random numbers.
I wasn't able to find good names for these classes, so I would appreciate any ideas :). I called the first class RngGenerator because it knows how to generate a random number generator...
If we leave the second class internal, perhaps it could stay RandomNumberGeneratorBase?

[assembly: LoadableClass(typeof(LaplacianFourierSampler), null, typeof(SignatureLoadModel),
"Laplacian Fourier Sampler Executor", "LaplacianSamplerExecutor", LaplacianFourierSampler.LoaderSignature)]
[assembly: LoadableClass(typeof(LaplacianRngGenerator.RandomNumberGenerator), null, typeof(SignatureLoadModel),
"Laplacian Fourier Sampler Executor", "LaplacianSamplerExecutor", LaplacianRngGenerator.RandomNumberGenerator.LoaderSignature)]

// REVIEW: Roll all of this in with the RffTransform.
namespace Microsoft.ML.Transforms
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this comment, and change namespace to Microsoft.ML.Transforms.Projections to be align with RFF #Closed

{
float Next(Random rand);
internal abstract float Dist(in VBuffer<float> first, in VBuffer<float> second);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dist [](start = 32, length = 4)

I know it's internal, but since we touching this, can we call it Distance? #Closed

@codecov
Copy link

codecov bot commented Feb 22, 2019

Codecov Report

Merging #2698 into master will increase coverage by 0.08%.
The diff coverage is 84.95%.

@@            Coverage Diff             @@
##           master    #2698      +/-   ##
==========================================
+ Coverage   71.58%   71.67%   +0.08%     
==========================================
  Files         805      808       +3     
  Lines      142025   142288     +263     
  Branches    16130    16136       +6     
==========================================
+ Hits       101674   101981     +307     
+ Misses      35910    35869      -41     
+ Partials     4441     4438       -3
Flag Coverage Δ
#Debug 71.67% <84.95%> (+0.08%) ⬆️
#production 67.92% <84.68%> (+0.04%) ⬆️
#test 85.86% <100%> (+0.12%) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.StaticPipe/TransformsStatic.cs 68.28% <ø> (ø) ⬆️
...icrosoft.ML.Transforms/RandomFourierFeaturizing.cs 83.41% <100%> (ø) ⬆️
test/Microsoft.ML.Tests/Transformers/RffTests.cs 100% <100%> (ø) ⬆️
...rosoft.ML.Transforms/FourierDistributionSampler.cs 84.16% <83.16%> (-1.88%) ⬇️
...osoft.ML.Data/DataLoadSave/Binary/UnsafeTypeOps.cs 71.55% <0%> (-6.43%) ⬇️
src/Microsoft.ML.Data/EntryPoints/InputBase.cs 70.96% <0%> (-4.84%) ⬇️
src/Microsoft.ML.Data/Data/Conversion.cs 67.19% <0%> (-2.48%) ⬇️
...rosoft.ML.Transforms/MissingValueReplacingUtils.cs 34.81% <0%> (-2.04%) ⬇️
src/Microsoft.ML.Core/Data/ColumnTypeExtensions.cs 84.81% <0%> (-1.86%) ⬇️
test/Microsoft.ML.Functional.Tests/Common.cs 98.54% <0%> (-1.46%) ⬇️
... and 111 more


public interface IFourierDistributionSampler : ICanSaveModel
public abstract class RngGeneratorBase
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RngGeneratorBase [](start = 26, length = 16)

I would call it DistributionBase, or KernelBase.

Matrix Generator
The RFF transform produces data whose inner product approximates a kernel-dot-product in the original data space. This transform can approximate two kernels: the Gaussian kernel and the Laplacian kernel. Each of these kernels has one user defined numeric parameter (gamma for the Gaussian kernel and A for the Laplacian kernel), that can be specified by clicking on the wrench button next to the kernel name.``` #Closed

}

[TlcModule.ComponentKind("FourierDistributionSampler")]
internal interface IFourierDistributionSamplerFactory : IComponentFactory<float, IFourierDistributionSampler>
internal abstract class RandomNumberGeneratorBase
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal abstract class RandomNumberGeneratorBase [](start = 4, length = 49)

any reason why it's not interface? #Closed

public RandomNumberGenerator(float gamma, float avgDist)
: base()
{
_gamma = gamma / avgDist;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avgDist [](start = 33, length = 7)

would be nice to have Assert for non zero value. #Closed

{
public sealed class Options : IFourierDistributionSamplerFactory
public sealed class Options : IComponentFactory<RngGeneratorBase>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options [](start = 28, length = 7)

does it still need to be public? #Closed


public RandomNumberGenerator(float a, float avgDist)
{
_a = a / avgDist;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avgDist [](start = 25, length = 7)

assert for non zero value. #Closed

/// </summary>
[BestFriend]
internal delegate void SignatureFourierDistributionSampler(float avgDist);
internal delegate void SignatureRngGenerator();
Copy link
Member

@eerhardt eerhardt Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename this for the new name? - SignatureKernelBase? #Resolved

distances[count++] = gaussian ? VectorUtils.L2DistSquared(in res[i], in res[j])
: VectorUtils.L1Distance(in res[i], in res[j]);
}
distances[count++] = columns[iinfo].Generator.Distance(in res[i], in res[j]);
Copy link
Member

@eerhardt eerhardt Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to factor out columns[iinfo].Generator into a new variable (here and below). That way we aren't indexing into the array over and over inside these nested for loops. #Resolved

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, @yaeldekel. I just had a few clean up items.

}

internal const string LoadName = "GaussianRandom";

private readonly float _gamma;

public GaussianFourierSampler(IHostEnvironment env, Options options, float avgDist)
public GaussianKernel(float gamma = 1)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GaussianKernel [](start = 15, length = 14)

would be nice to have comment.
Also MathUtils.Sqrt(2 * _gamma)
Since I doubt average distance can be negative and _gamma in random generator is this gamma / avgDistance, we need to specify what gamma is non-negative number. and assert it. #Resolved

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

private readonly float _a;

public LaplacianFourierSampler(IHostEnvironment env, Options options, float avgDist)
public LaplacianKernel(float a = 1)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have comment. #Resolved

}

public sealed class GaussianFourierSampler : IFourierDistributionSampler
public sealed class GaussianKernel : KernelBase
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GaussianKernel [](start = 24, length = 14)

would be nice to have comment. #Resolved

}
}

public sealed class LaplacianFourierSampler : IFourierDistributionSampler
public sealed class LaplacianKernel : KernelBase
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LaplacianKernel [](start = 24, length = 15)

would be nice to have comment #Resolved

@yaeldekel yaeldekel merged commit 4dd0d5d into dotnet:master Feb 26, 2019
@yaeldekel yaeldekel deleted the fourierdistributionsampler branch February 26, 2019 22:15
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants