Skip to content

PredictionEngine uses IRowToRowMapper, ITransformer can create IRowToRowMapper #973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 21, 2018

Conversation

TomFinley
Copy link
Contributor

@TomFinley TomFinley commented Sep 21, 2018

In which we reduce the overhead of calls to Predict by avoiding the creation of cursors.

PredictionEngine used to create a "fake" data view under the hood, by wrapping a BatchPredictionEngine, but only feeding in one item. This was a fairly heavy operation involving spooling up of cursors, including heavy reflection based processing on each row, etc. Now we are able to avoid this by using an existing interface IRowToRowMapper.

The other major part of the change is, of course, changing transformers so they can produce IRowToRowMapper, which they previously did not. In most cases this is relatively straightforward since nearly all transformers of interest are using these IRowToRowMapper under the hood to support their computation.

  • IRowToRowMapper enhancements, unification of some interfaces.
  • ITransformer has IRowToRowMapper accessor.
  • PredictionEngine now uses IRowToRowMapper

The effects of this are most dramatic using prediction engines outside of the ML.NET v0.1 pipeline API (e.g., using IEstimator based pipelines, pre-ML.NET v0.1 API, and so forth). The effect of this is that when predicting, now actual computation dominates the cost, as opposed to reflection based stuff. 😄 Due to architectural limitations of pipeline API models, these do not see benefit.

Will perhaps update with timings later.

Fix #986.

/// <summary>
/// A row-to-row mapper that is the result of a chained application of multiple mappers.
/// </summary>
public sealed class ChainedRowToRowMapper : IRowToRowMapper
Copy link
Contributor

@Zruty0 Zruty0 Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chained [](start = 24, length = 7)

Can we rename it either to CompositeRowToRowMapper (as this is similar to CompositeDataLoader) or RowToRowMapperChain (akin to TransformerChain and EstimatorChain) ? #Resolved

Copy link
Contributor Author

@TomFinley TomFinley Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me do the composite thing, since as an implementor of IRowToRowMapper it should end with RowToRowMapper. :D


In reply to: 219555464 [](ancestors = 219555464)

@markusweimer
Copy link
Member

This PR does not reference an issue. Please fix this before merging.


~PredictionEngine()
{
_disposer?.Invoke();
Copy link
Contributor

@Zruty0 Zruty0 Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_disposer [](start = 12, length = 9)

Just so that we don't forget. If we were planning to make IRow disposable, does it call for making PredictionEngine also disposable? #Pending

Copy link
Contributor Author

@TomFinley TomFinley Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're probably right about that.

I think we need to discuss this more thoroughly. Maybe I'll add an issue.


In reply to: 219577322 [](ancestors = 219577322)

return new InputRow<TRow>(env, internalSchemaDefn);
}

public static IDataView LoadPipeWithPredictor(IHostEnvironment env, Stream modelStream, IDataView view)
Copy link
Contributor

@Zruty0 Zruty0 Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

maybe internal? or else document it :) #Closed

Copy link
Contributor Author

@TomFinley TomFinley Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is internal. This DataViewConstructionUtils is an internal class.


In reply to: 219577553 [](ancestors = 219577553)

@@ -92,198 +360,126 @@ protected DataViewBase(IHostEnvironment env, string name, InternalSchemaDefiniti
return new[] { GetRowCursor(predicate, rand) };
}

public abstract class DataViewCursorBase : RootCursorBase, IRowCursor
public abstract class DataViewCursorBase : InputRowBase<TRow>, IRowCursor
Copy link
Contributor

@Zruty0 Zruty0 Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataViewCursorBase [](start = 34, length = 18)

all of this code is a clone of something, right? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes let me add a comment about that.


In reply to: 219578467 [](ancestors = 219578467)

@@ -26,6 +26,21 @@ public interface IRow<TRow> : IRow
void FillValues(TRow row);
}

/// <summary>
/// This interface is an <see cref="IRow"/> with 'stringly typed' binding.
Copy link
Contributor

@Zruty0 Zruty0 Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stringly [](start = 54, length = 8)

stringly [](start = 54, length = 8)

I like the word, but you probably meant 'strongly' ? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have just copy-pasted. :D


In reply to: 219578585 [](ancestors = 219578585)

/// This interface is an <see cref="IRow"/> with 'stringly typed' binding.
/// It can accept values of type <typeparamref name="TRow"/> and present the value as a row.
/// </summary>
/// <typeparam name="TRow"></typeparam>
Copy link
Contributor

@Zruty0 Zruty0 Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 8, length = 35)

[](start = 8, length = 35)

remove if empty #Closed

@@ -26,6 +26,21 @@ public interface IRow<TRow> : IRow
void FillValues(TRow row);
}

/// <summary>
Copy link
Contributor

@Zruty0 Zruty0 Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary [](start = 9, length = 7)

could you outline the contract of that?
If we get getters before the first Accept, what happens?

Also, Accept is a bit off, because it implies some choice on our part (accept or not).
This mirrors 'FillValues', so maybe ExtractValues, or ReadValues? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm yes.. let's do Extract since that's a nice counterpoint to Fill.


In reply to: 219579240 [](ancestors = 219579240)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also good call I should have a check on Position.


In reply to: 219607464 [](ancestors = 219607464,219579240)

/// It can accept values of type <typeparamref name="TRow"/> and present the value as a row.
/// </summary>
/// <typeparam name="TRow"></typeparam>
public interface IInputRow<TRow> : IRow
Copy link
Contributor

@Zruty0 Zruty0 Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IInputRow [](start = 21, length = 9)

IInputRow [](start = 21, length = 9)

I want to codify a bit better the difference between IRow<T> and IInputRow<T>. The difference is mostly in how they are instantiated, right? The first one takes an existing row, and lets us read it into object, and the second one lets us take an object and treat it as a row?

So maybe we could have like IRowReadableAs<TRow> and IRowBackedBy<TRow> ? or something like that? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're comfortable changing the API interface signatures that's fine.


In reply to: 219580213 [](ancestors = 219580213)

/// <summary>
/// Constructs a row-to-row mapper based on an input schema. If <see cref="IsRowToRowMapper"/>
/// is <c>false</c>, then an exception should be thrown. If the input schema is in any way
/// unsuitable for constructing the mapper, an exception should likewise be thrown.
Copy link
Contributor

@Zruty0 Zruty0 Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an exception should likewise be thrown [](start = 52, length = 38)

I think this should be checked in TestEstimatorCore somewhere #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're probably right... let me check the schemas.


In reply to: 219580497 [](ancestors = 219580497)

@@ -111,6 +112,14 @@ protected static TextLoader.Arguments MakeIrisTextLoaderArgs()
{
var schema = transformer.GetOutputSchema(data.Schema);

// If it's a row to row mapper, then the output name.
if (transformer.IsRowToRowMapper)
Copy link
Contributor

@Zruty0 Zruty0 Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsRowToRowMapper [](start = 32, length = 16)

How about adding mustFail's for GetRowToRowMapper on invalid data? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty good, let's do that. Also I see I screwed up the comments here.


In reply to: 219619342 [](ancestors = 219619342)

@TomFinley
Copy link
Contributor Author

Thanks @markusweimer. I feel the overpowering urge to replace you with a script? 😀


In reply to: 423611894 [](ancestors = 423611894)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley TomFinley requested a review from zeahmed September 21, 2018 20:42
{
Func<int, bool> toReturn = predicate;
for (int i = _innerMappers.Length - 1; i <= 0; --i)
toReturn = _innerMappers[i].GetDependencies(toReturn);
Copy link
Member

@sfilipi sfilipi Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toReturn [](start = 16, length = 8)

does this look right? #ByDesign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right to me. We're chaining the delegates backwards.


In reply to: 219621916 [](ancestors = 219621916)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is that the predicates are (as elsewhere) predicates on the outputs, so if we are chaining the row-to-row mappers, then while we transform forwards, we have to check the dependencies in the opposite direction.


In reply to: 219622203 [](ancestors = 219622203,219621916)

@markusweimer
Copy link
Member

Thanks @markusweimer. I feel the overpowering urge to replace you with a script? 😀

Please do!

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley TomFinley merged commit 46ba430 into dotnet:master Sep 21, 2018
@TomFinley TomFinley deleted the tfinley/RowMap branch September 21, 2018 21:43
_outputRow = cursorable.GetRow(outputRow);
}

~PredictionEngine()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have a finalizer without having IDisposable on an object. See https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/dispose-pattern

Especially this part:

When the CLR needs to call a finalizer, it must postpone collection of the object’s memory until the next round of garbage collection (the finalizers run between collections). This means that the object’s memory (and all objects it refers to) will not be released for a longer period of time.

Therefore, relying exclusively on finalizers might not be appropriate in many scenarios when it is important to reclaim unmanaged resources as quickly as possible, when dealing with scarce resources, or in highly performant scenarios in which the added GC overhead of finalization is unacceptable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants