-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Don't fail in case of const field in Collection source and extended support for type conversion #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't fail in case of const field in Collection source and extended support for type conversion #555
Conversation
Extend support for basic C# types for DataVIew<->collection conversion.
src/Microsoft.ML.Api/TypedCursor.cs
Outdated
@@ -280,11 +280,38 @@ private Action<TRow> GenerateSetter(IRow input, int index, InternalSchemaDefinit | |||
if (fieldType.GetElementType() == typeof(string)) | |||
{ | |||
Ch.Assert(colType.ItemType.IsText); | |||
return CreateVBufferToStringArraySetter(input, index, poke, peek); | |||
return CreateVBufferSetter<DvText, string>(input, index, poke, peek, (x) => x.ToString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(x) => x.ToString() [](start = 93, length = 19)
Would x => x.ToString()
not work for some reason? #Closed
src/Microsoft.ML.Api/TypedCursor.cs
Outdated
del = CreateDirectSetter<int>; | ||
Ch.Assert(colType.IsBool); | ||
Ch.Assert(peek == null); | ||
return CreateActionSetter<DvBool, bool?>(input, index, poke, (x) => x.IsNA ? (bool?)null : (x.IsTrue ? true : false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x.IsNA ? (bool?)null : (x.IsTrue ? true : false) [](start = 92, length = 48)
We have this here:
public static explicit operator bool?(BL value) |
So would a straightforward x => (bool?)x
not work, rather than reproducing the logic in this method? #Closed
src/Microsoft.ML.Api/TypedCursor.cs
Outdated
else if (fieldType.GetElementType() == typeof(bool)) | ||
{ | ||
Ch.Assert(colType.ItemType.IsBool); | ||
return CreateVBufferSetter<DvBool, bool>(input, index, poke, peek, (x) => Convert.ToBoolean(x.RawValue)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(x) => Convert.ToBoolean(x.RawValue) [](start = 91, length = 36)
Similarly, would casts not work here? #Closed
src/Microsoft.ML.Api/TypedCursor.cs
Outdated
Ch.Assert(colType.ItemType == NumberType.I4); | ||
return CreateVBufferSetter<DvInt4, int>(input, index, poke, peek, (x) => x.RawValue); | ||
} | ||
else if (fieldType.GetElementType() == typeof(short)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're doing this with int
, the explicit casts would throw in the case of missing values, whereas things like this will just silently pass int.MinValue
. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, I think we do want to throw here, if someone makes the mistake of using a type that doesn't support missing values, over data that actually has missing values. #Closed
Hi @Ivanidzo4ka I was looking and maybe I'm just missing something obvious, but where is the code that handles the const fields? I mean, we have tests I guess so it must be somewhere, I simply can't find it. :) |
Sorry, for all this mess, I start with const field, and get carried away. it's in SchemaDefinition.cs In reply to: 406310650 [](ancestors = 406310650) |
src/Microsoft.ML.Api/ApiUtils.cs
Outdated
@@ -21,9 +21,10 @@ private static OpCode GetAssignmentOpCode(Type t) | |||
// REVIEW: This should be a Dictionary<Type, OpCode> based solution. | |||
// DvTexts, strings, arrays, and VBuffers. | |||
if (t == typeof(DvInt8) || t == typeof(DvInt4) || t == typeof(DvInt2) || t == typeof(DvInt1) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the comment #Closed
@@ -327,7 +327,8 @@ public static SchemaDefinition Create(Type userType) | |||
// This field does not need a column. | |||
// REVIEW: maybe validate the channel attribute now, instead | |||
// of later at cursor creation. | |||
if (fieldInfo.FieldType == typeof(IChannel)) | |||
// Const fields not need to be mapped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not need [](start = 31, length = 9)
do not need #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this comment only applies to the IsLiteral clause. I suggest to decouple the 2 clauses and have a different comment for each.
// Const fields not need to be mapped.
if (fieldInfo.IsLiteral)
continue;
// Clause to handle the field that may be used to expose the cursor channel.
// This field does not need a column.
// REVIEW: maybe validate the channel attribute now, instead
// of later at cursor creation.
if (fieldInfo.FieldType == typeof(IChannel) || fieldInfo.IsLiteral)
continue;
In reply to: 203871821 [](ancestors = 203871821)
} | ||
else if (colType.IsVector) | ||
{ | ||
// VBuffer<T> -> VBuffer<T> | ||
// REVIEW: Do we care about accomodating VBuffer<string> -> VBuffer<DvText>? | ||
// REVIEW: why it's int and not long? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// REVIEW: why it's int and not long? [](start = 24, length = 37)
This is a strange place to add this comment.
We are doing MarshalInvoke here, right. We have the 'del' as a generic on some token type (it was historically always int), then in line 283 we make the correct getter. #Closed
} | ||
else if (outputType == typeof(int?)) | ||
{ | ||
// int -> DvInt4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int [](start = 31, length = 3)
int? #Closed
} | ||
else if (outputType == typeof(sbyte?)) | ||
{ | ||
// sbyte -> DvInt1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sbyte [](start = 31, length = 5)
sbyte? #Closed
{ | ||
peek(GetCurrentRowObject(), Position, ref buf); | ||
dst = buf.HasValue ? (DvBool)buf.Value : DvBool.NA; | ||
dst = convert(buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert [](start = 30, length = 7)
If convert
is an identity, we are still incurring a delegate method call here, which is not efficient. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have identity converts. even if I pass is x=>x I convert from TSrc to TDst class and call implicit converter. Which I'm not sure how to express through generic classes.
I would like to get rid of them, I just don't know way how to do it.
In reply to: 203875078 [](ancestors = 203875078)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see now. Ok, why don't we call this CreateConvertingGetterDelegate
, to make it clear that there's also CreateDirectGetterDelegate
?
In reply to: 203876308 [](ancestors = 203876308,203875078)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to recall this conversation with Shon, that any 'helper' mechanism that would introduce these expensive conversions better doesn't exist at all.
I don't agree with it, but it's something to consider: with all but no warning to the user, the pipeline with int
will be slower than the pipeline with float
because of the conversion involved.
This is part of a conversation about why we even have DvInts.
Maybe a REVIEW comment alongside this converting getter/setter would be in order?
REVIEW: the converting getter invokes a type conversion delegate on every call, so it's inherently slower than the 'direct' getter. We don't have good indication of this to the user, and the selection of affected types is pretty arbitrary (signed integers and bools, but not uints and floats).
In reply to: 203877031 [](ancestors = 203877031,203876308,203875078)
{ | ||
// long? -> DvInt8 | ||
Ch.Assert(colType == NumberType.I8); | ||
return CreateGetterDelegate<long?, DvInt8>(index, (x) => x.HasValue ? x.Value : DvInt8.NA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(x [](start = 78, length = 2)
why not x => x ?? DvInt8.NA
here and below? #Closed
Hmm, it seems that this change will have a destructive potential. Could you please verify that we won't run into issues trying to copy a I took a look myself, and feel like we're safe for now, but it would be better to double-check Refers to: src/Microsoft.ML.Core/Data/DataKind.cs:183 in 49323f2. [](commit_id = 49323f2, deletion_comment = False) |
return true; | ||
if (x == null && y != null) | ||
return false; | ||
return (x.Equals(y)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( [](start = 19, length = 1)
parentheses unnecessary #Closed
return (x.Equals(y)); | ||
} | ||
|
||
public bool CompareThrougReflection<T>(T x, T y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throug [](start = 27, length = 6)
Via. Or at least 'through' #Closed
} | ||
|
||
[Fact] | ||
public void BackAndForthConversionWithBasicTypes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BackAndForth [](start = 20, length = 12)
'RoundTrip' is a more widely accepted term I think #Closed
var dataNullable = new List<ConversionNullalbeClass>() | ||
{ | ||
new ConversionNullalbeClass(){ fInt=int.MaxValue-1, fuInt=uint.MaxValue-1, fBool=true, fsByte=sbyte.MaxValue-1, fByte = byte.MaxValue-1, | ||
fDouble =double.MaxValue-1, fFloat=float.MaxValue-1, fLong=long.MaxValue-1, fuLong = ulong.MaxValue-1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= [](start = 27, length = 2)
Could you Ctrl-K-D for proper spacing? Or just, generally, invoke magic for proper spacing? #Closed
{ | ||
gotException = true; | ||
} | ||
Assert.True(gotException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert.True(gotException); [](start = 20, length = 26)
is this the idiom?
I'd prefer something along the lines of
try
{
MoveNext();
Assert.Fail();
}
catch
{
Assert.True(true);
}
``` #Closed
} | ||
Assert.True(!enumeratorNullable.MoveNext() && !originalNullalbleEnumerator.MoveNext()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} [](start = 8, length = 1)
Good job on the new tests. I assume you ran test coverage of them, right? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dotnet-bot test Windows_NT Release |
Ah I see, thank you @Ivanidzo4ka. (Out of curiosity, readonly fields? Static fields?) Also there are two possible things we could consider as reasonable for In reply to: 406343983 [](ancestors = 406343983,406310650) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Producing a constant column is, I think, reasonable. Then again, we currently do not have too different code for reading and writing, and I think there's value to keep it this way. If we cannot read it back, it's reasonable not to allow writing it either. As for forcing In reply to: 406766414 [](ancestors = 406766414,406343983,406310650) |
…upport for type conversion (dotnet#555) * Don't fail in case of const field in Collection source. Extend support for basic C# types for DataVIew<->collection conversion.
…upport for type conversion (dotnet#555) * Don't fail in case of const field in Collection source. Extend support for basic C# types for DataVIew<->collection conversion.
Fixes #537.
Adds support for multiple basic C# types to convert Dataview <-> collection.