-
Notifications
You must be signed in to change notification settings - Fork 1.9k
PredictionEngine uses IRowToRowMapper, ITransformer can create IRowToRowMapper #973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/// <summary> | ||
/// A row-to-row mapper that is the result of a chained application of multiple mappers. | ||
/// </summary> | ||
public sealed class ChainedRowToRowMapper : IRowToRowMapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chained [](start = 24, length = 7)
Can we rename it either to CompositeRowToRowMapper
(as this is similar to CompositeDataLoader
) or RowToRowMapperChain
(akin to TransformerChain
and EstimatorChain
) ? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me do the composite thing, since as an implementor of IRowToRowMapper
it should end with RowToRowMapper
. :D
In reply to: 219555464 [](ancestors = 219555464)
This PR does not reference an issue. Please fix this before merging. |
|
||
~PredictionEngine() | ||
{ | ||
_disposer?.Invoke(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_disposer [](start = 12, length = 9)
Just so that we don't forget. If we were planning to make IRow
disposable, does it call for making PredictionEngine
also disposable? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're probably right about that.
I think we need to discuss this more thoroughly. Maybe I'll add an issue.
In reply to: 219577322 [](ancestors = 219577322)
return new InputRow<TRow>(env, internalSchemaDefn); | ||
} | ||
|
||
public static IDataView LoadPipeWithPredictor(IHostEnvironment env, Stream modelStream, IDataView view) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
maybe internal
? or else document it :) #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is internal. This DataViewConstructionUtils
is an internal class.
In reply to: 219577553 [](ancestors = 219577553)
@@ -92,198 +360,126 @@ protected DataViewBase(IHostEnvironment env, string name, InternalSchemaDefiniti | |||
return new[] { GetRowCursor(predicate, rand) }; | |||
} | |||
|
|||
public abstract class DataViewCursorBase : RootCursorBase, IRowCursor | |||
public abstract class DataViewCursorBase : InputRowBase<TRow>, IRowCursor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataViewCursorBase [](start = 34, length = 18)
all of this code is a clone of something, right? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/Microsoft.ML.Api/TypedCursor.cs
Outdated
@@ -26,6 +26,21 @@ public interface IRow<TRow> : IRow | |||
void FillValues(TRow row); | |||
} | |||
|
|||
/// <summary> | |||
/// This interface is an <see cref="IRow"/> with 'stringly typed' binding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stringly [](start = 54, length = 8)
stringly [](start = 54, length = 8)
I like the word, but you probably meant 'strongly' ? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/Microsoft.ML.Api/TypedCursor.cs
Outdated
/// This interface is an <see cref="IRow"/> with 'stringly typed' binding. | ||
/// It can accept values of type <typeparamref name="TRow"/> and present the value as a row. | ||
/// </summary> | ||
/// <typeparam name="TRow"></typeparam> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 8, length = 35)
[](start = 8, length = 35)
remove if empty #Closed
@@ -26,6 +26,21 @@ public interface IRow<TRow> : IRow | |||
void FillValues(TRow row); | |||
} | |||
|
|||
/// <summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
summary [](start = 9, length = 7)
could you outline the contract of that?
If we get getters before the first Accept
, what happens?
Also, Accept
is a bit off, because it implies some choice on our part (accept or not).
This mirrors 'FillValues', so maybe ExtractValues
, or ReadValues
? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm yes.. let's do Extract
since that's a nice counterpoint to Fill
.
In reply to: 219579240 [](ancestors = 219579240)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also good call I should have a check on Position
.
In reply to: 219607464 [](ancestors = 219607464,219579240)
src/Microsoft.ML.Api/TypedCursor.cs
Outdated
/// It can accept values of type <typeparamref name="TRow"/> and present the value as a row. | ||
/// </summary> | ||
/// <typeparam name="TRow"></typeparam> | ||
public interface IInputRow<TRow> : IRow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IInputRow [](start = 21, length = 9)
IInputRow [](start = 21, length = 9)
I want to codify a bit better the difference between IRow<T>
and IInputRow<T>
. The difference is mostly in how they are instantiated, right? The first one takes an existing row, and lets us read it into object, and the second one lets us take an object and treat it as a row?
So maybe we could have like IRowReadableAs<TRow>
and IRowBackedBy<TRow>
? or something like that? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're comfortable changing the API interface signatures that's fine.
In reply to: 219580213 [](ancestors = 219580213)
/// <summary> | ||
/// Constructs a row-to-row mapper based on an input schema. If <see cref="IsRowToRowMapper"/> | ||
/// is <c>false</c>, then an exception should be thrown. If the input schema is in any way | ||
/// unsuitable for constructing the mapper, an exception should likewise be thrown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an exception should likewise be thrown [](start = 52, length = 38)
I think this should be checked in TestEstimatorCore
somewhere #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're probably right... let me check the schemas.
In reply to: 219580497 [](ancestors = 219580497)
* ITransformer has IRowToRowMapper accessor.
5d819b7
to
dcc1e40
Compare
@@ -111,6 +112,14 @@ protected static TextLoader.Arguments MakeIrisTextLoaderArgs() | |||
{ | |||
var schema = transformer.GetOutputSchema(data.Schema); | |||
|
|||
// If it's a row to row mapper, then the output name. | |||
if (transformer.IsRowToRowMapper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsRowToRowMapper [](start = 32, length = 16)
How about adding mustFail
's for GetRowToRowMapper
on invalid data? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's pretty good, let's do that. Also I see I screwed up the comments here.
In reply to: 219619342 [](ancestors = 219619342)
Thanks @markusweimer. I feel the overpowering urge to replace you with a script? 😀 In reply to: 423611894 [](ancestors = 423611894) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
Func<int, bool> toReturn = predicate; | ||
for (int i = _innerMappers.Length - 1; i <= 0; --i) | ||
toReturn = _innerMappers[i].GetDependencies(toReturn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toReturn [](start = 16, length = 8)
does this look right? #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks right to me. We're chaining the delegates backwards.
In reply to: 219621916 [](ancestors = 219621916)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the idea is that the predicates are (as elsewhere) predicates on the outputs, so if we are chaining the row-to-row mappers, then while we transform forwards, we have to check the dependencies in the opposite direction.
In reply to: 219622203 [](ancestors = 219622203,219621916)
Please do! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_outputRow = cursorable.GetRow(outputRow); | ||
} | ||
|
||
~PredictionEngine() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't have a finalizer without having IDisposable
on an object. See https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/dispose-pattern
Especially this part:
When the CLR needs to call a finalizer, it must postpone collection of the object’s memory until the next round of garbage collection (the finalizers run between collections). This means that the object’s memory (and all objects it refers to) will not be released for a longer period of time.
Therefore, relying exclusively on finalizers might not be appropriate in many scenarios when it is important to reclaim unmanaged resources as quickly as possible, when dealing with scarce resources, or in highly performant scenarios in which the added GC overhead of finalization is unacceptable.
In which we reduce the overhead of calls to
Predict
by avoiding the creation of cursors.PredictionEngine
used to create a "fake" data view under the hood, by wrapping aBatchPredictionEngine
, but only feeding in one item. This was a fairly heavy operation involving spooling up of cursors, including heavy reflection based processing on each row, etc. Now we are able to avoid this by using an existing interfaceIRowToRowMapper
.The other major part of the change is, of course, changing transformers so they can produce
IRowToRowMapper
, which they previously did not. In most cases this is relatively straightforward since nearly all transformers of interest are using theseIRowToRowMapper
under the hood to support their computation.IRowToRowMapper
enhancements, unification of some interfaces.ITransformer
hasIRowToRowMapper
accessor.PredictionEngine
now usesIRowToRowMapper
The effects of this are most dramatic using prediction engines outside of the ML.NET v0.1 pipeline API (e.g., using
IEstimator
based pipelines, pre-ML.NET v0.1 API, and so forth). The effect of this is that when predicting, now actual computation dominates the cost, as opposed to reflection based stuff. 😄 Due to architectural limitations of pipeline API models, these do not see benefit.Will perhaps update with timings later.
Fix #986.