Skip to content

Don't fail in case of const field in Collection source and extended support for type conversion #555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Ivanidzo4ka
Copy link
Contributor

Fixes #537.
Adds support for multiple basic C# types to convert Dataview <-> collection.

Extend support for basic C# types for DataVIew<->collection conversion.
@Ivanidzo4ka Ivanidzo4ka requested review from TomFinley and Zruty0 July 19, 2018 01:46
@@ -280,11 +280,38 @@ private Action<TRow> GenerateSetter(IRow input, int index, InternalSchemaDefinit
if (fieldType.GetElementType() == typeof(string))
{
Ch.Assert(colType.ItemType.IsText);
return CreateVBufferToStringArraySetter(input, index, poke, peek);
return CreateVBufferSetter<DvText, string>(input, index, poke, peek, (x) => x.ToString());
Copy link
Contributor

@TomFinley TomFinley Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(x) => x.ToString() [](start = 93, length = 19)

Would x => x.ToString() not work for some reason? #Closed

del = CreateDirectSetter<int>;
Ch.Assert(colType.IsBool);
Ch.Assert(peek == null);
return CreateActionSetter<DvBool, bool?>(input, index, poke, (x) => x.IsNA ? (bool?)null : (x.IsTrue ? true : false));
Copy link
Contributor

@TomFinley TomFinley Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x.IsNA ? (bool?)null : (x.IsTrue ? true : false) [](start = 92, length = 48)

We have this here:

public static explicit operator bool?(BL value)

So would a straightforward x => (bool?)x not work, rather than reproducing the logic in this method? #Closed

else if (fieldType.GetElementType() == typeof(bool))
{
Ch.Assert(colType.ItemType.IsBool);
return CreateVBufferSetter<DvBool, bool>(input, index, poke, peek, (x) => Convert.ToBoolean(x.RawValue));
Copy link
Contributor

@TomFinley TomFinley Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(x) => Convert.ToBoolean(x.RawValue) [](start = 91, length = 36)

Similarly, would casts not work here? #Closed

Ch.Assert(colType.ItemType == NumberType.I4);
return CreateVBufferSetter<DvInt4, int>(input, index, poke, peek, (x) => x.RawValue);
}
else if (fieldType.GetElementType() == typeof(short))
Copy link
Contributor

@TomFinley TomFinley Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're doing this with int, the explicit casts would throw in the case of missing values, whereas things like this will just silently pass int.MinValue. #Closed

Copy link
Contributor

@TomFinley TomFinley Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, I think we do want to throw here, if someone makes the mistake of using a type that doesn't support missing values, over data that actually has missing values. #Closed

@TomFinley
Copy link
Contributor

TomFinley commented Jul 19, 2018

Hi @Ivanidzo4ka I was looking and maybe I'm just missing something obvious, but where is the code that handles the const fields? I mean, we have tests I guess so it must be somewhere, I simply can't find it. :)

@Ivanidzo4ka
Copy link
Contributor Author

Sorry, for all this mess, I start with const field, and get carried away.

it's in SchemaDefinition.cs
// Const fields not need to be mapped.
if (fieldInfo.FieldType == typeof(IChannel) || fieldInfo.IsLiteral)
continue;
IsLiteral is indication of const field, so I just skip it and don't infer column from it.


In reply to: 406310650 [](ancestors = 406310650)

@@ -21,9 +21,10 @@ private static OpCode GetAssignmentOpCode(Type t)
// REVIEW: This should be a Dictionary<Type, OpCode> based solution.
// DvTexts, strings, arrays, and VBuffers.
if (t == typeof(DvInt8) || t == typeof(DvInt4) || t == typeof(DvInt2) || t == typeof(DvInt1) ||
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the comment #Closed

@@ -327,7 +327,8 @@ public static SchemaDefinition Create(Type userType)
// This field does not need a column.
// REVIEW: maybe validate the channel attribute now, instead
// of later at cursor creation.
if (fieldInfo.FieldType == typeof(IChannel))
// Const fields not need to be mapped.
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not need [](start = 31, length = 9)

do not need #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this comment only applies to the IsLiteral clause. I suggest to decouple the 2 clauses and have a different comment for each.

// Const fields not need to be mapped.
if (fieldInfo.IsLiteral)
                    continue;
// Clause to handle the field that may be used to expose the cursor channel. 
// This field does not need a column.
// REVIEW: maybe validate the channel attribute now, instead 
// of later at cursor creation.
if (fieldInfo.FieldType == typeof(IChannel) || fieldInfo.IsLiteral)
                    continue;

In reply to: 203871821 [](ancestors = 203871821)

}
else if (colType.IsVector)
{
// VBuffer<T> -> VBuffer<T>
// REVIEW: Do we care about accomodating VBuffer<string> -> VBuffer<DvText>?
// REVIEW: why it's int and not long?
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// REVIEW: why it's int and not long? [](start = 24, length = 37)

This is a strange place to add this comment.
We are doing MarshalInvoke here, right. We have the 'del' as a generic on some token type (it was historically always int), then in line 283 we make the correct getter. #Closed

}
else if (outputType == typeof(int?))
{
// int -> DvInt4
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int [](start = 31, length = 3)

int? #Closed

}
else if (outputType == typeof(sbyte?))
{
// sbyte -> DvInt1
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sbyte [](start = 31, length = 5)

sbyte? #Closed

{
peek(GetCurrentRowObject(), Position, ref buf);
dst = buf.HasValue ? (DvBool)buf.Value : DvBool.NA;
dst = convert(buf);
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert [](start = 30, length = 7)

If convert is an identity, we are still incurring a delegate method call here, which is not efficient. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have identity converts. even if I pass is x=>x I convert from TSrc to TDst class and call implicit converter. Which I'm not sure how to express through generic classes.
I would like to get rid of them, I just don't know way how to do it.


In reply to: 203875078 [](ancestors = 203875078)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see now. Ok, why don't we call this CreateConvertingGetterDelegate, to make it clear that there's also CreateDirectGetterDelegate?


In reply to: 203876308 [](ancestors = 203876308,203875078)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to recall this conversation with Shon, that any 'helper' mechanism that would introduce these expensive conversions better doesn't exist at all.
I don't agree with it, but it's something to consider: with all but no warning to the user, the pipeline with int will be slower than the pipeline with float because of the conversion involved.

This is part of a conversation about why we even have DvInts.

Maybe a REVIEW comment alongside this converting getter/setter would be in order?

REVIEW: the converting getter invokes a type conversion delegate on every call, so it's inherently slower than the 'direct' getter. We don't have good indication of this to the user, and the selection of affected types is pretty arbitrary (signed integers and bools, but not uints and floats).

In reply to: 203877031 [](ancestors = 203877031,203876308,203875078)

{
// long? -> DvInt8
Ch.Assert(colType == NumberType.I8);
return CreateGetterDelegate<long?, DvInt8>(index, (x) => x.HasValue ? x.Value : DvInt8.NA);
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(x [](start = 78, length = 2)

why not x => x ?? DvInt8.NA here and below? #Closed

@Zruty0
Copy link
Contributor

Zruty0 commented Jul 19, 2018

    public static bool TryGetDataKind(this Type type, out DataKind kind)

Hmm, it seems that this change will have a destructive potential. Could you please verify that we won't run into issues trying to copy a sbyte into a DvInt1 somewhere?

I took a look myself, and feel like we're safe for now, but it would be better to double-check


Refers to: src/Microsoft.ML.Core/Data/DataKind.cs:183 in 49323f2. [](commit_id = 49323f2, deletion_comment = False)

return true;
if (x == null && y != null)
return false;
return (x.Equals(y));
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( [](start = 19, length = 1)

parentheses unnecessary #Closed

return (x.Equals(y));
}

public bool CompareThrougReflection<T>(T x, T y)
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throug [](start = 27, length = 6)

Via. Or at least 'through' #Closed

}

[Fact]
public void BackAndForthConversionWithBasicTypes()
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackAndForth [](start = 20, length = 12)

'RoundTrip' is a more widely accepted term I think #Closed

var dataNullable = new List<ConversionNullalbeClass>()
{
new ConversionNullalbeClass(){ fInt=int.MaxValue-1, fuInt=uint.MaxValue-1, fBool=true, fsByte=sbyte.MaxValue-1, fByte = byte.MaxValue-1,
fDouble =double.MaxValue-1, fFloat=float.MaxValue-1, fLong=long.MaxValue-1, fuLong = ulong.MaxValue-1,
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= [](start = 27, length = 2)

Could you Ctrl-K-D for proper spacing? Or just, generally, invoke magic for proper spacing? #Closed

{
gotException = true;
}
Assert.True(gotException);
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert.True(gotException); [](start = 20, length = 26)

is this the idiom?
I'd prefer something along the lines of

try
{
  MoveNext();
  Assert.Fail();
}
catch
{
  Assert.True(true);
}
``` #Closed

}
Assert.True(!enumeratorNullable.MoveNext() && !originalNullalbleEnumerator.MoveNext());
}
}
Copy link
Contributor

@Zruty0 Zruty0 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} [](start = 8, length = 1)

Good job on the new tests. I assume you ran test coverage of them, right? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, all new changes are covered


In reply to: 203883396 [](ancestors = 203883396)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Ivanidzo4ka
Copy link
Contributor Author

@dotnet-bot test Windows_NT Release

@TomFinley
Copy link
Contributor

Ah I see, thank you @Ivanidzo4ka. (Out of curiosity, readonly fields? Static fields?)

Also there are two possible things we could consider as reasonable for const fields. We could ignore them, or just produce a constant column at least, when it is an input, and force them to use the NoColumn attribute on any field (no matter its modifiers) to suppress. Not sure if you've thought about it.


In reply to: 406343983 [](ancestors = 406343983,406310650)

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Zruty0
Copy link
Contributor

Zruty0 commented Jul 23, 2018

Producing a constant column is, I think, reasonable. Then again, we currently do not have too different code for reading and writing, and I think there's value to keep it this way. If we cannot read it back, it's reasonable not to allow writing it either.

As for forcing [NoColumn] : I would prefer to not have a mandatory attribute like this. For reading the fields back, we allow two mechanisms for 'non-column' fields: a [NoColumn] attribute, or a global ignoreMissingColumns parameter. I think it would be better to do what Ivan does now to consts (which is the same as what we currently do for propertiers).


In reply to: 406766414 [](ancestors = 406766414,406343983,406310650)

@Ivanidzo4ka Ivanidzo4ka merged commit 015a15e into dotnet:master Jul 23, 2018
eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018
…upport for type conversion (dotnet#555)

* Don't fail in case of const field in Collection source.
Extend support for basic C# types for DataVIew<->collection conversion.
codemzs pushed a commit to codemzs/machinelearning that referenced this pull request Aug 1, 2018
…upport for type conversion (dotnet#555)

* Don't fail in case of const field in Collection source.
Extend support for basic C# types for DataVIew<->collection conversion.
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants