Skip to content

Replace predicate with an IEnumerable<DataViewSchema.Column> for IRowToRowMapper.GetRow and ISchemaBoundRowMapper.GetRow #2796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 8, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions src/Microsoft.Data.DataView/DataViewSchema.cs
Original file line number Diff line number Diff line change
Expand Up @@ -217,14 +217,14 @@ private void CheckGetter<TValue>(Delegate getter)
/// <summary>
/// Get a getter delegate for one value of the annotations row.
/// </summary>
public ValueGetter<TValue> GetGetter<TValue>(int col)
public ValueGetter<TValue> GetGetter<TValue>(DataViewSchema.Column column)
{
if (!(0 <= col && col < Schema.Count))
throw new ArgumentOutOfRangeException(nameof(col));
var typedGetter = _getters[col] as ValueGetter<TValue>;
if (column.Index >= _getters.Length)
throw new ArgumentException(nameof(column));
var typedGetter = _getters[column.Index] as ValueGetter<TValue>;
if (typedGetter == null)
{
Debug.Assert(_getters[col] != null);
Debug.Assert(_getters[column.Index] != null);
throw new InvalidOperationException($"Invalid call to '{nameof(GetGetter)}'");
}
return typedGetter;
Expand All @@ -238,7 +238,7 @@ public void GetValue<TValue>(string kind, ref TValue value)
var column = Schema.GetColumnOrNull(kind);
if (column == null)
throw new InvalidOperationException($"Invalid call to '{nameof(GetValue)}'");
GetGetter<TValue>(column.Value.Index)(ref value);
GetGetter<TValue>(column.Value)(ref value);
}

public override string ToString() => string.Join(", ", Schema.Select(x => x.Name));
Expand Down
8 changes: 5 additions & 3 deletions src/Microsoft.Data.DataView/IDataView.cs
Original file line number Diff line number Diff line change
Expand Up @@ -134,14 +134,16 @@ public abstract class DataViewRow : IDisposable
/// <summary>
/// Returns whether the given column is active in this row.
/// </summary>
public abstract bool IsColumnActive(int col);
public abstract bool IsColumnActive(DataViewSchema.Column column);

/// <summary>
/// Returns a value getter delegate to fetch the given column value from the row.
/// Returns a value getter delegate to fetch the value of the given <paramref name="column"/>, from the row.
/// This throws if the column is not active in this row, or if the type
/// <typeparamref name="TValue"/> differs from this column's type.
/// </summary>
public abstract ValueGetter<TValue> GetGetter<TValue>(int col);
/// <typeparam name="TValue"> is the column's content type.</typeparam>
/// <param name="column"> is the output column whose getter should be returned.</param>
public abstract ValueGetter<TValue> GetGetter<TValue>(DataViewSchema.Column column);

/// <summary>
/// Gets a <see cref="Schema"/>, which provides name and type information for variables
Expand Down
6 changes: 3 additions & 3 deletions src/Microsoft.Data.DataView/SchemaDebuggerProxy.cs
Original file line number Diff line number Diff line change
Expand Up @@ -41,16 +41,16 @@ private static List<KeyValuePair<string, object>> BuildValues(DataViewSchema.Ann
foreach (var column in annotations.Schema)
{
var name = column.Name;
var value = Utils.MarshalInvoke(GetValue<int>, column.Type.RawType, annotations, column.Index);
var value = Utils.MarshalInvoke(GetValue<DataViewSchema.Column>, column.Type.RawType, annotations, column);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataViewSchema.Column [](start = 57, length = 21)

Incidentally this was not necessary or helpful. The point is to merely have a method for the method to sink its teeth into, so the type parameter here was not informative. (If anything it is now more obscure.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not essential -- the method will still work the same even with this change, and there should be no cost to it. Just pointing out that there was no need for it.


In reply to: 263101561 [](ancestors = 263101561)

result.Add(new KeyValuePair<string, object>(name, value));
}
return result;
}

private static object GetValue<T>(DataViewSchema.Annotations annotations, int columnIndex)
private static object GetValue<T>(DataViewSchema.Annotations annotations, DataViewSchema.Column column)
{
T value = default;
annotations.GetGetter<T>(columnIndex)(ref value);
annotations.GetGetter<T>(column)(ref value);
return value;
}
}
Expand Down
17 changes: 15 additions & 2 deletions src/Microsoft.ML.Core/Data/AnnotationUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -460,9 +460,22 @@ public AnnotationRow(DataViewSchema.Annotations annotations)
public override DataViewSchema Schema => _annotations.Schema;
public override long Position => 0;
public override long Batch => 0;
public override ValueGetter<TValue> GetGetter<TValue>(int col) => _annotations.GetGetter<TValue>(col);

/// <summary>
/// Returns a value getter delegate to fetch the value of column with the given columnIndex, from the row.
/// This throws if the column is not active in this row, or if the type
/// <typeparamref name="TValue"/> differs from this column's type.
/// </summary>
/// <typeparam name="TValue"> is the column's content type.</typeparam>
/// <param name="column"> is the output column whose getter should be returned.</param>
public override ValueGetter<TValue> GetGetter<TValue>(DataViewSchema.Column column) => _annotations.GetGetter<TValue>(column);

public override ValueGetter<DataViewRowId> GetIdGetter() => (ref DataViewRowId dst) => dst = default;
public override bool IsColumnActive(int col) => true;

/// <summary>
/// Returns whether the given column is active in this row.
/// </summary>
public override bool IsColumnActive(DataViewSchema.Column column) => true;
}

/// <summary>
Expand Down
5 changes: 2 additions & 3 deletions src/Microsoft.ML.Core/Data/IRowToRowMapper.cs
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,7 @@ public interface IRowToRowMapper

/// <summary>
/// Get an <see cref="DataViewRow"/> with the indicated active columns, based on the input <paramref name="input"/>.
/// The active columns are those for which <paramref name="active"/> returns true. Getting values on inactive
/// columns of the returned row will throw. Null predicates are disallowed.
/// Getting values on inactive columns of the returned row will throw.
///
/// The <see cref="DataViewRow.Schema"/> of <paramref name="input"/> should be the same object as
/// <see cref="InputSchema"/>. Implementors of this method should throw if that is not the case. Conversely,
Expand All @@ -48,6 +47,6 @@ public interface IRowToRowMapper
/// The output <see cref="DataViewRow"/> values are re-computed when requested through the getters. Also, the returned
/// <see cref="DataViewRow"/> will dispose <paramref name="input"/> when it is disposed.
/// </summary>
DataViewRow GetRow(DataViewRow input, Func<int, bool> active);
DataViewRow GetRow(DataViewRow input, IEnumerable<DataViewSchema.Column> activeColumns);
}
}
5 changes: 2 additions & 3 deletions src/Microsoft.ML.Core/Data/ISchemaBindableMapper.cs
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,7 @@ internal interface ISchemaBoundRowMapper : ISchemaBoundMapper

/// <summary>
/// Get an <see cref="DataViewRow"/> with the indicated active columns, based on the input <paramref name="input"/>.
/// The active columns are those for which <paramref name="active"/> returns true. Getting values on inactive
/// columns of the returned row will throw. Null predicates are disallowed.
/// Getting values on inactive columns of the returned row will throw.
///
/// The <see cref="DataViewRow.Schema"/> of <paramref name="input"/> should be the same object as
/// <see cref="InputSchema"/>. Implementors of this method should throw if that is not the case. Conversely,
Expand All @@ -86,6 +85,6 @@ internal interface ISchemaBoundRowMapper : ISchemaBoundMapper
/// The output <see cref="DataViewRow"/> values are re-computed when requested through the getters. Also, the returned
/// <see cref="DataViewRow"/> will dispose <paramref name="input"/> when it is disposed.
/// </summary>
DataViewRow GetRow(DataViewRow input, Func<int, bool> active);
DataViewRow GetRow(DataViewRow input, IEnumerable<DataViewSchema.Column> activeColumns);
}
}
22 changes: 16 additions & 6 deletions src/Microsoft.ML.Core/Data/LinkedRowRootCursorBase.cs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ namespace Microsoft.ML.Data
/// <summary>
/// A base class for a <see cref="DataViewRowCursor"/> that has an input cursor, but still needs to do work on
/// <see cref="DataViewRowCursor.MoveNext"/>. Note that the default
/// <see cref="LinkedRowRootCursorBase.GetGetter{TValue}(int)"/> assumes that each input column is exposed as an
/// <see cref="LinkedRowRootCursorBase.GetGetter{TValue}(DataViewSchema.Column)"/> assumes that each input column is exposed as an
/// output column with the same column index.
/// </summary>
[BestFriend]
Expand All @@ -29,15 +29,25 @@ protected LinkedRowRootCursorBase(IChannelProvider provider, DataViewRowCursor i
Schema = schema;
}

public sealed override bool IsColumnActive(int col)
/// <summary>
/// Returns whether the given column is active in this row.
/// </summary>
public sealed override bool IsColumnActive(DataViewSchema.Column column)
{
Ch.Check(0 <= col && col < Schema.Count);
return _active == null || _active[col];
Ch.Check(column.Index < Schema.Count);
return _active == null || _active[column.Index];
}

public override ValueGetter<TValue> GetGetter<TValue>(int col)
/// <summary>
/// Returns a value getter delegate to fetch the value of column with the given columnIndex, from the row.
/// This throws if the column is not active in this row, or if the type
/// <typeparamref name="TValue"/> differs from this column's type.
/// </summary>
/// <typeparam name="TValue"> is the column's content type.</typeparam>
/// <param name="column"> is the output column whose getter should be returned.</param>
public override ValueGetter<TValue> GetGetter<TValue>(DataViewSchema.Column column)
{
return Input.GetGetter<TValue>(col);
return Input.GetGetter<TValue>(column);
}
}
}
35 changes: 35 additions & 0 deletions src/Microsoft.ML.Core/Utilities/Utils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -776,6 +776,41 @@ public static T[] BuildArray<T>(int length, Func<int, T> func)
return result;
}

/// <summary>
/// Given a predicate, over a range of values defined by a limit calculate
/// first the values for which that predicate was true, and second an inverse
/// map.
/// </summary>
/// <param name="schema">The input schema where the predicate can check if columns are active.</param>
/// <param name="pred">The predicate to test for various value</param>
/// <param name="map">An ascending array of values from 0 inclusive
/// to <paramref name="schema.Count"/> exclusive, holding all values for which
/// <paramref name="pred"/> is true</param>
/// <param name="invMap">Forms an inverse mapping of <paramref name="map"/>,
/// so that <c><paramref name="invMap"/>[<paramref name="map"/>[i]] == i</c>,
/// and for other entries not appearing in <paramref name="map"/>,
/// <c><paramref name="invMap"/>[i] == -1</c></param>
public static void BuildSubsetMaps(DataViewSchema schema, Func<DataViewSchema.Column, bool> pred, out int[] map, out int[] invMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Func<DataViewSchema.Column, bool> pred [](start = 66, length = 38)

Hmmm. Is the goal here to enable more elaborate checking I suppose? Totally fine, just interesting.

{
Contracts.CheckValue(schema, nameof(schema));
Contracts.Check(schema.Count > 0, nameof(schema));
Contracts.CheckValue(pred, nameof(pred));
// REVIEW: Better names?
List<int> mapList = new List<int>();
invMap = new int[schema.Count];
for (int c = 0; c < schema.Count; ++c)
{
if (!pred(schema[c]))
{
invMap[c] = -1;
continue;
}
invMap[c] = mapList.Count;
mapList.Add(c);
}
map = mapList.ToArray();
}

/// <summary>
/// Given a predicate, over a range of values defined by a limit calculate
/// first the values for which that predicate was true, and second an inverse
Expand Down
Loading