Proposal for Fluent API

In this issue I describe a proposal for a fluent API for the building of ML.NET learning pipelines. This API would be consistent with existing .NET patterns such as LINQ, allowing people new to ML.NET to pick it up easily. It would allow clear concise code for simple scenarios, whilst allowing easy extension for more complex situations.

# Background

The `LearningPipeline` API used by the current preview releases of ML.NET has a number of limitations. Theprogramming model does not fit in with other .NET code (we do not write other code as a series of steps added to a list), and follows a linear pipeline without merging/branching (e.g. with data from multiple sources, or train/test splitting of data).

The recent proposal for a major API change by @TomFinley in issue #371 is a bit step forward towards a more natural programming model, with each step of the pipeline new-ed up in turn. I would argue however that this no longer reflects the true flow through a learning pipeline, with previous steps being relegated to a parameter of the constructor. This proposal builds on top of #371 with a fluent API.

# Proposed API

By using extension functions (in a similar manner to LINQ) we can pass the previous step of a pipeline as the 'this' parameter into subsequent steps, preserving the natural flow. For example,

```
var loader = new TextLoader(new MultiFileSource(dataPath),
        useHeader: true, separator: ',',
        cols: new[] { ... });
var transform = transform.AddConcatTransform(env, trans, "CategoryFeatures",
        "Bedrooms", "Bathrooms", "Floors", "Waterfront", "View", "Condition", "Grade",
        "YearBuilt", "YearRenovated", "Zipcode");
var transform = transform.AddCategoricalTransform("CategoryFeatures");
```

This could be further cleaned up to,

```
var pipeline = new TextLoader(new MultiFileSource(dataPath),
        useHeader: true, separator: ',',
        cols: new[] { ... })
    .AddConcatTransform(env, trans, "CategoryFeatures",
        "Bedrooms", "Bathrooms", "Floors", "Waterfront", "View", "Condition", "Grade",
        "YearBuilt", "YearRenovated", "Zipcode")
    .AddCategoricalTransform("CategoryFeatures");
```

## More complex examples,

You could easily write extension functions that combine multiple steps, but could be consumed in the same way. Something like the following (I've created a hypothetical `IDataPipeline` to represent any pipeline step that produces data),

```
public IDataPipeline CreateCategories(this IDataPipeline input)
{
    return input.AddConcatTransform(env, trans, "CategoryFeatures",
            "Bedrooms", "Bathrooms", "Floors", "Waterfront", "View", "Condition", "Grade",
            "YearBuilt", "YearRenovated", "Zipcode")
        .AddCategoricalTransform("CategoryFeatures");
}
```

You could easily merge data from two pipelines,

```
var input1 = new.TextLoader(...)
        .DoSomeTransforms();
var input2 = new.TextLoader(...)
        .DoSomeMoreTransforms();

var input = input1.ConcatenateRows(input2);
```

You could take advantage of tuples to split the data pipeline, such that different steps could be applied before later merging,

```
var (train, test) = input.AddTrainTestSplit(...);

train.DoSomeTransforms();
test.DoSomeMoreTransforms();
```

# Summary

This is an outline proposal for an alternative API that could be used alongside, or instead of that proposed in issue #371. There are still some rough edges here and there, but I hope that this will start a discussion of the posibilites provided by a fluent API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal for Fluent API #474

Background

Proposed API

More complex examples,

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal for Fluent API #474

Description

Background

Proposed API

More complex examples,

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions