-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Clean up the SchemaDefinition class #2995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
public Column(IExceptionContext ectx, string memberName, DataViewType columnType, | ||
string columnName = null, IEnumerable<AnnotationInfo> annotationInfos = null, Delegate generator = null) | ||
public Column(string memberName, DataViewType columnType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 12, length = 6)
I would like to make this internal as well. Is there any scenario where users would want to add new columns to the SchemaDefinition
generated from the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. In fact I think that's one of the major scenarios, probably the central reason why people need this structure, isn't it? You start with an empty schema definition, then you add columns to it to describe how you want fields mapped to columns, etc.? This object exists in large part for those situations where you don't or somehow can't rely purely on the reflection based mechanism.
In reply to: 266437352 [](ancestors = 266437352)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking you start with a schema that's auto-generated from the type, then modify the columns that are already in it. The schema generated from the type contains all the possible columns, why would we need to add new columns (as opposed to modifying existing columns)?
In reply to: 266605420 [](ancestors = 266605420,266437352)
Codecov Report
@@ Coverage Diff @@
## master #2995 +/- ##
==========================================
+ Coverage 72.41% 72.43% +0.02%
==========================================
Files 803 804 +1
Lines 143851 143916 +65
Branches 16173 16173
==========================================
+ Hits 104171 104250 +79
+ Misses 35258 35250 -8
+ Partials 4422 4416 -6
|
|
||
namespace Microsoft.ML.Functional.Tests | ||
{ | ||
public class PredictionEngineScenarios : BaseTestClass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) I would name the file and the class name the same. #Resolved
base.Initialize(); | ||
|
||
_ml = new MLContext(42); | ||
_ml.AddStandardComponents(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to call AddStandardComponents
? That should only be necessary when you are doing things like using the MAML syntax. When you are strictly using the API, it shouldn't be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023059
to
de633a3
Compare
} | ||
} | ||
} | ||
|
||
private SchemaDefinition() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SchemaDefinition [](start = 16, length = 16)
So, why is this private? I'm thinking about how I'd like to use it. I have my class, I create a new schema definition (but empty), then I populate the mapping. Do I have any other way to create an empty one of these guys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that not only is there no way to create an empty one, but there is no longer a way to add a new column to it, since the column constructor is now internal.
Are we sure there are no scenarios that need to do this?
Is this actually the only way to create a schema definition? As far as I see this auto-populates everything, which is sort of the opposite of what someone actually trying to use this structure would want to do, most of the time. (Since the entire reason someone wants to create this thing is to be explicit about the dataview-type mapping.) Refers to: src/Microsoft.ML.Data/Data/SchemaDefinition.cs:326 in de633a3. [](commit_id = de633a3, deletion_comment = False) |
Maybe if you hadn't made the constructor private that would be sufficient. In reply to: 474069107 [](ancestors = 474069107) Refers to: src/Microsoft.ML.Data/Data/SchemaDefinition.cs:326 in de633a3. [](commit_id = de633a3, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yaeldekel thanks for working on this. The primary reason why this thing exists is to allow people to be explicit about the mapping of columns. While many of the internalizations and cleanups were appropriate, we took out the ability to construct an empty one, which is the primary use case. So we ought to add that back in.
9e7dd80
to
cbc96e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yaeldekel!
fb6481b
to
0598c9f
Compare
Fixes #2978 .