You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Things like ITransformer (or its predecessor, IDataTransform) given an IDataView can produce another IDataView. This works well for doing things like streaming over billions of records, but for just one record, the whole machinery around setting up a cursor.
What this does currently is it basically composes an IDataView consisting of one item, then applies the transform chain to it, and so on. But this is pretty heavyweight. The setting up the dynamically typed delegates, binding to the appropriate types, and so on, on every single point absolutely dwarfs any actual computation that happens in many pipelines. Again, this system is fine if you're doing what it was designed to do, stream efficiently over billions of records, but on a small scale it's not great.
There is an existing IRowToRowMapper interface that we might be able to exploit.
This interface is somewhat analogous to IDataView, and the IRowToRowMapper.GetRow method is somewhat analogous to IDataView.GetRowCursor. This is something many existing IDataTransform interfaces would implement, to enable faster mapping. We can exploit this same functionality through ITransformer.
So we can do this:
Allow ITransformer to, in addition to providing IDataViews through transformation of datasets, optionally allow them to also provide IRowToRowMapper implementors.
Exploit this new functionality to make PredictionEngine faster, on applicable pipelines.
This will also allow us to check in prediction engine if a pipeline really is able to be expressed in a row-to-row capacity.
The text was updated successfully, but these errors were encountered:
Things like
ITransformer
(or its predecessor,IDataTransform
) given anIDataView
can produce anotherIDataView
. This works well for doing things like streaming over billions of records, but for just one record, the whole machinery around setting up a cursor.For example, consider the prediction engine.
machinelearning/src/Microsoft.ML.Api/PredictionEngine.cs
Line 149 in ecb9126
What this does currently is it basically composes an
IDataView
consisting of one item, then applies the transform chain to it, and so on. But this is pretty heavyweight. The setting up the dynamically typed delegates, binding to the appropriate types, and so on, on every single point absolutely dwarfs any actual computation that happens in many pipelines. Again, this system is fine if you're doing what it was designed to do, stream efficiently over billions of records, but on a small scale it's not great.There is an existing
IRowToRowMapper
interface that we might be able to exploit.machinelearning/src/Microsoft.ML.Core/Data/ISchemaBindableMapper.cs
Line 91 in ecb9126
This interface is somewhat analogous to
IDataView
, and theIRowToRowMapper.GetRow
method is somewhat analogous toIDataView.GetRowCursor
. This is something many existingIDataTransform
interfaces would implement, to enable faster mapping. We can exploit this same functionality throughITransformer
.So we can do this:
Allow
ITransformer
to, in addition to providingIDataView
s through transformation of datasets, optionally allow them to also provideIRowToRowMapper
implementors.Exploit this new functionality to make
PredictionEngine
faster, on applicable pipelines.This will also allow us to check in prediction engine if a pipeline really is able to be expressed in a row-to-row capacity.
The text was updated successfully, but these errors were encountered: