Skip to content

ITransformer yields IRowToRowMapper, make prediction engine faster #986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomFinley opened this issue Sep 21, 2018 · 0 comments
Closed

Comments

@TomFinley
Copy link
Contributor

Things like ITransformer (or its predecessor, IDataTransform) given an IDataView can produce another IDataView. This works well for doing things like streaming over billions of records, but for just one record, the whole machinery around setting up a cursor.

For example, consider the prediction engine.

public sealed class PredictionEngine<TSrc, TDst>

What this does currently is it basically composes an IDataView consisting of one item, then applies the transform chain to it, and so on. But this is pretty heavyweight. The setting up the dynamically typed delegates, binding to the appropriate types, and so on, on every single point absolutely dwarfs any actual computation that happens in many pipelines. Again, this system is fine if you're doing what it was designed to do, stream efficiently over billions of records, but on a small scale it's not great.

There is an existing IRowToRowMapper interface that we might be able to exploit.

This interface is somewhat analogous to IDataView, and the IRowToRowMapper.GetRow method is somewhat analogous to IDataView.GetRowCursor. This is something many existing IDataTransform interfaces would implement, to enable faster mapping. We can exploit this same functionality through ITransformer.

So we can do this:

  • Allow ITransformer to, in addition to providing IDataViews through transformation of datasets, optionally allow them to also provide IRowToRowMapper implementors.

  • Exploit this new functionality to make PredictionEngine faster, on applicable pipelines.

This will also allow us to check in prediction engine if a pipeline really is able to be expressed in a row-to-row capacity.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant