-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Need to simplify Data Types to simpler/less common .NET types #1386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Do you have a suggestion in mind @CESARDELATORRE ? |
Maybe something like the following:
The closer it can be to data types in C#, the better: https://docs.microsoft.com/en-us/dotnet/api/system.timezoneinfo?view=netframework-4.7.2 |
I think mapping to .NET types is positive. In the past, we chose to have 2-character descriptions of our types, but I was never a big fan. Maybe @TomFinley was? He was the creator of The one type I am suspicious about is |
This looks good to me. What purpose do the current granular types play after the work done in #673? |
We will still have most of the list: |
Sound good! Thank you guys! 👍 |
Is there currently a way to map these to their respective .NET types? I mainly ask since quite a few examples use the |
Any refresh on this issue? I strongly think that the data types we currently have (The list of DataKind types shown above looks "surprising") should be simplified and aligned to regular .NET types, if possible. @eerhardt @danmosemsft @TomFinley @markusweimer , any opinion about it? |
I'd be glad to help but I'm not sure how to map the current Data Kind types to .NET types.
…________________________________
From: Cesar De la Torre <[email protected]>
Sent: Wednesday, January 9, 2019 7:24 PM
To: dotnet/machinelearning
Cc: Jon Wood; Comment
Subject: Re: [dotnet/machinelearning] Need to simplify Data Types to simpler/less common .NET types (#1386)
Any refresh on this issue? I strongly think that the data types we currently have (The list of DataKind types shown above looks "surprising") should be simplified and aligned to regular .NET types, if possible.
@eerhardt<https://github.com/eerhardt> @danmosemsft<https://github.com/danmosemsft> @TomFinley<https://github.com/TomFinley> @markusweimer<https://github.com/markusweimer> , any opinion about it?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#1386 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABgUsDA1o5xclxSd_jFr9vGXosxIILL3ks5vBogogaJpZM4X7PS9>.
|
Great! We could really use some help here. The overall goal (as I understand it) is that we want to remove the So basically, if you could search for usages of DataKind and remove them and instead use things like Maybe do a file/class at a time and send a PR. Once I'm done doing my current work I was going to start in this area. So it would be good to do small incremental chunks, so we don't overlap too much.
Check out machinelearning/src/Microsoft.ML.Core/Data/DataKind.cs Lines 136 to 175 in 312f9e4
|
I wanted to ping the thread as I see this is in 0.11. After reading #2006, what is this issue tracking for 0.11? Is it to remove DataKind all up or is it to internalize DataKind (which mainly looks to be on TextLoader). |
I would say we should internalize/remove usages of If we decide that Note - the names we should use should match the |
one point to mention here when read from IEnumerable if i have int , double , float , bool data types always map it to Float DataKind.R4 so when try to generate model it throw exception for mismatch as he expected to find R4 and find some else i worked around and convert every type to be float which in not fine to do that is there ability to Explicitly set this or some attributes that load correct type for numeric variable best regards |
This is the current public enum DataKind : byte
{
/// <summary>1-byte integer, type of <see cref="System.SByte"/>.</summary>
SByte = 1,
/// <summary>1-byte unsigned integer, type of <see cref="System.Byte"/>.</summary>
Byte = 2,
/// <summary>2-byte integer, type of <see cref="System.Int16"/>.</summary>
Int16 = 3,
/// <summary>2-byte usigned integer, type of <see cref="System.UInt16"/>.</summary>
UInt16 = 4,
/// <summary>4-byte integer, type of <see cref="System.Int32"/>.</summary>
Int32 = 5,
/// <summary>4-byte usigned integer, type of <see cref="System.UInt32"/>.</summary>
UInt32 = 6,
/// <summary>8-byte integer, type of <see cref="System.Int64"/>.</summary>
Int64 = 7,
/// <summary>8-byte usigned integer, type of <see cref="System.UInt64"/>.</summary>
UInt64 = 8,
/// <summary>4-byte floating-point number, type of <see cref="System.Single"/>.</summary>
Single = 9,
/// <summary>8-byte floating-point number, type of <see cref="System.Double"/>.</summary>
Double = 10,
/// <summary>string, type of <see cref="System.String"/>.</summary>
String = 11,
/// <summary>boolean variable type, type of <see cref="System.Boolean"/>.</summary>
Boolean = 12,
/// <summary>type of <see cref="System.TimeSpan"/>.</summary>
TimeSpan = 13,
/// <summary>type of <see cref="System.DateTime"/>.</summary>
DateTime = 14,
/// <summary>type of <see cref="System.DateTimeOffset"/>.</summary>
DateTimeOffset = 15,
} |
Yes - I believe our change to the public DataKind enum closing this issue. Please re-open if this isn't the case. |
Fixed by #2661 |
The additional issue is that error messages are still showing those internal data types.. ;) |
In the dynamic API we have too many and not .NET "standard types" you need to use when creating the file columns schema for loading data. The list of types is not very "dotnetty":
This is what you see in intellisense:
For instance for text you have Text, TX, and TXT, many types not very clear like R4, etc.
We need to have a simpler and more standard list of data types.
The text was updated successfully, but these errors were encountered: