[Feature] Support more than 1 input for `VotingClassifier` #1016

eddiebergman · 2023-08-18T15:23:24Z

Hi there,

I am new to onnx in general so apologies if the issue is misplaced or I am missing something fundamental.

I'm coming from the tool autosklearn and planning to introduce some basic onnx support by exporting found models after doing some optimization over possible pipelines. These pipelines will mostly consist of an ensemble (VotingClassifier) which they themselves contain Pipelines with disjoint imputation strategies, feature preprocessing and estimators.

Based on the error below, it seems that using a VotingClassifier would require all features to be numeric (or at least of the same TensorType) to be viable? Is this correct? Is there something fundamental which would prevent the SklearnVotingClassifier operator from working with more than 1 input?

I am linking to this issue here in case anyone using autosklearn would like to enable onnx support and would be able to contribute! I've included a reproducible example and the traceback

Reproducible Example

Apologies for using openml, sklearn toy datasets do not have such varied column types.

from __future__ import annotations


def main():
    import openml
    from mlprodict.onnx_conv import guess_schema_from_data
    from onnxruntime import InferenceSession
    from skl2onnx import to_onnx
    from sklearn.ensemble import RandomForestClassifier, VotingClassifier
    from sklearn.impute import SimpleImputer
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import OrdinalEncoder

    dataset = openml.datasets.get_dataset(31)
    X, y, _, _ = dataset.get_data(target=dataset.default_target_attribute)

    model = VotingClassifier(
        estimators=[
            (
                "est",
                Pipeline(
                    steps=[
                        ("imputer", SimpleImputer(strategy="most_frequent")),
                        ("encoder", OrdinalEncoder()),
                        ("rf", RandomForestClassifier(n_estimators=10)),
                    ],
                ),
            ),
        ],
    )
    model.fit(X, y)
    schema = guess_schema_from_data(X)

    # Errors here
    onnx_model = to_onnx(model=model, initial_types=schema)

    sess = InferenceSession(onnx_model.SerializeToString())
    inputs = {c: X[c].to_numpy().reshape((-1, 1)) for c in X.columns}
    got = sess.run(None, inputs)

    print(got)


if __name__ == "__main__":
    main()

Traceback

/blank/.venv/lib/python3.10/site-packages/openml/datasets/functions.py:438: FutureWarning: Starting from Version 0.15 `download_data`, `download_qualities`, and `download_features_meta_data` will all be ``False`` instead of ``True`` by default to enable lazy loading. To disable this message until version 0.15 explicitly set `download_data`, `download_qualities`, and `download_features_meta_data` to a bool while calling `get_dataset`.
  warnings.warn(
Traceback (most recent call last):
  File "/blank/onnx-test.py", line 45, in <module>
    main()
  File "/blank/onnx-test.py", line 35, in main
    onnx_model = to_onnx(model=model, initial_types=schema)
  File "/blank/.venv/lib/python3.10/site-packages/skl2onnx/convert.py", line 306, in to_onnx
    return convert_sklearn(
  File "/blank/.venv/lib/python3.10/site-packages/skl2onnx/convert.py", line 208, in convert_sklearn
    onnx_model = convert_topology(
  File "/blank/.venv/lib/python3.10/site-packages/skl2onnx/common/_topology.py", line 1532, in convert_topology
    topology.convert_operators(container=container, verbose=verbose)
  File "/blank/.venv/lib/python3.10/site-packages/skl2onnx/common/_topology.py", line 1348, in convert_operators
    self.call_shape_calculator(operator)
  File "/blank/.venv/lib/python3.10/site-packages/skl2onnx/common/_topology.py", line 1163, in call_shape_calculator
    operator.infer_types()
  File "/blank/.venv/lib/python3.10/site-packages/skl2onnx/common/_topology.py", line 652, in infer_types
    shape_calc(self)
  File "/blank/.venv/lib/python3.10/site-packages/skl2onnx/shape_calculators/voting_classifier.py", line 8, in voting_classifier_shape_calculator
    return _calculate_linear_classifier_output_shapes(
  File "/blank/.venv/lib/python3.10/site-packages/skl2onnx/common/shape_calculator.py", line 43, in _calculate_linear_classifier_output_shapes
    check_input_and_output_numbers(
  File "/blank/.venv/lib/python3.10/site-packages/onnxconverter_common/utils.py", line 295, in check_input_and_output_numbers
    raise RuntimeError(
RuntimeError: For operator SklearnVotingClassifier (type: SklearnVotingClassifier), at most 1 input(s) is(are) supported but we got 20 input(s) which are
['checking_status', 'duration', 'credit_history', 'purpose', 'credit_amount', 'savings_status', 'employment', 'installment_commitment',
'personal_status', 'other_parties', 'residence_since', 'property_magnitude', 'age', 'other_payment_plans', 'housing',
'existing_credits', 'job', 'num_dependents', 'own_telephone', 'foreign_worker']

The text was updated successfully, but these errors were encountered:

xadupre · 2023-10-03T13:33:25Z

The converter does expect to have one tensor as input. You can use a ColumnTransformer to concatenate all columns into a single one. Then, I put the encoder first as onnx only support numerical values for Imputer. This is the pipeline validated in PR #1030.

model = Pipeline(
    steps=[
        (
            "concat",
            ColumnTransformer(
                [("concat", "passthrough", list(range(X.shape[1])))],
                sparse_threshold=0,
            ),
        ),
        (
            "voting",
            VotingClassifier(
                flatten_transform=False,
                estimators=[
                    (
                        "est",
                        Pipeline(
                            steps=[
                                # This encoder is placed before SimpleImputer because
                                # onnx does not support text for Imputer
                                ("encoder", OrdinalEncoder()),
                                (
                                    "imputer",
                                    SimpleImputer(strategy="most_frequent"),
                                ),
                                (
                                    "rf",
                                    RandomForestClassifier(
                                        n_estimators=4,
                                        max_depth=4,
                                        random_state=0,
                                    ),
                                ),
                            ],
                        ),
                    ),
                ],
            ),
        ),
    ]
)

eddiebergman mentioned this issue Aug 18, 2023

What's in store for Auto-Sklearn? -- From the Developers automl/auto-sklearn#1677

Open

xadupre mentioned this issue Oct 2, 2023

VotingClassifier and input DataFrame #1030

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Support more than 1 input for `VotingClassifier` #1016

[Feature] Support more than 1 input for `VotingClassifier` #1016

eddiebergman commented Aug 18, 2023

xadupre commented Oct 3, 2023

Uh oh!

[Feature] Support more than 1 input for VotingClassifier #1016

[Feature] Support more than 1 input for VotingClassifier #1016

Comments

eddiebergman commented Aug 18, 2023

xadupre commented Oct 3, 2023

Uh oh!

[Feature] Support more than 1 input for `VotingClassifier` #1016

[Feature] Support more than 1 input for `VotingClassifier` #1016