Skip to content

Number of feature columns #2179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wschin opened this issue Jan 17, 2019 · 5 comments
Closed

Number of feature columns #2179

wschin opened this issue Jan 17, 2019 · 5 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@wschin
Copy link
Member

wschin commented Jan 17, 2019

It has been a while that ML.NET assumes only one feature column can exist in a training pipeline. Recently, we have added field-aware factorization machine so that argument becomes not 100% correct. We will only have only two public APIs per trainer (please see #2047 as an example). To make our public APIs consistent, we need to determine if feature column name should be an array or a scalar. Or we can introduce another API which accepts multiple feature (even label) columns. @TomFinley, @eerhardt, any comments please?

@abgoswam
Copy link
Member

abgoswam commented Jan 18, 2019

Currently the public API of Field-aware factorization machine accepts a string[] as features:

public static FieldAwareFactorizationMachineTrainer FieldAwareFactorizationMachine(this BinaryClassificationContext.BinaryClassificationTrainers ctx,
string[] featureColumns,
string labelColumn = DefaultColumnNames.Label,
string weights = null,
Action<FieldAwareFactorizationMachineTrainer.Arguments> advancedSettings = null)
{
Contracts.CheckValue(ctx, nameof(ctx));

However, the advanced arguments for FFM (aka FieldAwareFactorizationMachineTrainer.Arguments) takes a string FeatureColumn

This leads to inconsistency in the public API of FieldAwareFactorizationMachine, when we separate out the advanced arguments in a separate API (Related to the work we are doing in #1798)

@glebuk @sfilipi

@eerhardt
Copy link
Member

If one algorithm (say field-aware factorization machine) can accept multiple feature columns, and other algorithms (say SDCA) can only accept a single feature column, I don't see a reason why the APIs across the two need to be consistent.

Why limit FFM to only allow one column when it can support many?
Why have SDCA allow multiple columns when it can only support one?

@glebuk
Copy link
Contributor

glebuk commented Jan 18, 2019

I'd expect to be consistency between "basic" and "advanced" arguments. Different functionality between different trainers require different type for features. Having string[] featureColumns in both method and arg class seems reasonable.

@wschin
Copy link
Member Author

wschin commented Jan 19, 2019

Single-feature assumption doesn't sound great to me. By searching for references of FFM, we can find some advances are using multiple feature columns. Just for reference, LIBFFM has more than 1k stars but merely implements a single algorithm for training binary classification FFM.

@abgoswam
Copy link
Member

Closing out this issue. PR #2205 fixed it.

@shauheen shauheen added the enhancement New feature or request label Feb 5, 2019
@shauheen shauheen added this to the 0119 milestone Feb 5, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants