Chains of Chains

It is possible to nest `EstimatorChain`s inside one another, fit them, and use them to transform data. The result is an object that is a nested `TransformerChain`.

Question: Is this intended behavior? Do we want to allow this sort of nesting in the V1 API?

I think that the proper way to handle nesting is to **first flatten the structure before the fit and return a single `EstimatorChain`**. I believe that since there is no forking and joining, that nested and non-nested pipelines are identical, except for the returned object. Data transformed by these objects should be the same whether the pipeline is nested or not (and is in my limited testing).

Take a look at the following example where we featurize the UCI Adult dataset.

```cs
var mlContext = new MLContext(seed: 1, conc: 1);

// Load the Adult (tiny) dataset
var data = mlContext.Data.LoadFromTextFile<Adult>(GetDataPath(TestDatasets.adult.trainFilename),
    hasHeader: TestDatasets.adult.fileHasHeader,
    separatorChar: TestDatasets.adult.fileSeparator);

// Create the learning pipeline
var pipeline = mlContext.Transforms.Concatenate("NumericalFeatures", Adult.NumericalFeatures)
    .Append(mlContext.Transforms.Concatenate("CategoricalFeatures", Adult.CategoricalFeatures))
    .Append(mlContext.Transforms.Categorical.OneHotHashEncoding("CategoricalFeatures",
        invertHash: 2, outputKind: OneHotEncodingTransformer.OutputKind.Bag))
    .Append(mlContext.Transforms.Concatenate("Features", "NumericalFeatures", "CategoricalFeatures"))
    .Append(mlContext.BinaryClassification.Trainers.LogisticRegression());

// Train the model.
var model = pipeline.Fit(data);
```

Here, `pipeline` is an `EstimatorChain<BinaryPredictionTransformer<...>>` and `model` is a `TransformerChain<BinaryPredictionTransformer<...>>`.

It's also possible to nest the pipeline. Perhaps you accidentally put an errant `)` here and there, and then you have this:
```cs
// Create the learning pipeline
var pipeline = mlContext.Transforms.Concatenate("NumericalFeatures", Adult.NumericalFeatures)
    .Append(mlContext.Transforms.Concatenate("CategoricalFeatures", Adult.CategoricalFeatures))
    .Append(mlContext.Transforms.Categorical.OneHotHashEncoding("CategoricalFeatures",
        invertHash: 2, outputKind: OneHotEncodingTransformer.OutputKind.Bag) // <-- missing a )
    .Append(mlContext.Transforms.Concatenate("Features", "NumericalFeatures", "CategoricalFeatures"))
    .Append(mlContext.BinaryClassification.Trainers.LogisticRegression())); // <-- extra )
```

Now, `pipeline` is an `EstimatorChain<EstimatorChain<BinaryPredictionTransformer<...>>>` and `model` is a `TransformerChain<TransformerChain<BinaryPredictionTransformer<...>>>`.

Now, if I compare the two (where `var predictor = model.LastTransformer` and `var nestedPredictor = nestedModel.LastTransformer.LastTransformer`), it's clear that the models and the transformed data are identical:
```cs
//True!
Assert.Equal(predictor.Model.SubModel.Bias, nestedPredictor.Model.SubModel.Bias);
int nFeatures = predictor.Model.SubModel.Weights.Count;
for (int i = 0; i < nFeatures; i++ )
    //True!
    Assert.Equal(predictor.Model.SubModel.Weights[i], nestedPredictor.Model.SubModel.Weights[i]); 

var transformedRows = mlContext.Data.CreateEnumerable<BinaryPrediction>(transformedData, false).ToArray();
var nestedTransformedRows = mlContext.Data.CreateEnumerable<BinaryPrediction>(nestedTransformedData, false).ToArray();
for (int i = 0; i < transformedRows.Length; i++)
    //True!
    Assert.Equal(transformedRows[i].Score, nestedTransformedRows[i].Score); 
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chains of Chains #2820

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Chains of Chains #2820

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions