Description
Things like ITransformer
(or its predecessor, IDataTransform
) given an IDataView
can produce another IDataView
. This works well for doing things like streaming over billions of records, but for just one record, the whole machinery around setting up a cursor.
For example, consider the prediction engine.
What this does currently is it basically composes an IDataView
consisting of one item, then applies the transform chain to it, and so on. But this is pretty heavyweight. The setting up the dynamically typed delegates, binding to the appropriate types, and so on, on every single point absolutely dwarfs any actual computation that happens in many pipelines. Again, this system is fine if you're doing what it was designed to do, stream efficiently over billions of records, but on a small scale it's not great.
There is an existing IRowToRowMapper
interface that we might be able to exploit.
This interface is somewhat analogous to IDataView
, and the IRowToRowMapper.GetRow
method is somewhat analogous to IDataView.GetRowCursor
. This is something many existing IDataTransform
interfaces would implement, to enable faster mapping. We can exploit this same functionality through ITransformer
.
So we can do this:
-
Allow
ITransformer
to, in addition to providingIDataView
s through transformation of datasets, optionally allow them to also provideIRowToRowMapper
implementors. -
Exploit this new functionality to make
PredictionEngine
faster, on applicable pipelines.
This will also allow us to check in prediction engine if a pipeline really is able to be expressed in a row-to-row capacity.