Simple IDataView implementation sample. #3302

TomFinley · 2019-04-11T23:06:15Z

TomFinley · 2019-04-11T23:09:29Z

docs/samples/Microsoft.ML.Samples/Dynamic/SimpleDataViewImplementation.cs

+        }
+
+        /// <summary>
+        /// This is an implementation of <see cref="IDataView"/> that wraps an <see cref="IEnumerable{T}"/>


In my first attempt at this, I had written much of this material as separate XML docs for each member, but I found that this made it hard to tell a connected "story" about what is going on here, so I stuck to one big block on the IDataView implementation and one block on the DataViewRowCursor derived class. This made it easier to tell a cohesive story, while also having it I think look a lot less busy.

They are XML comments so I can use <see tags, mostly so that someone inspecting this sample can sometimes F12 to see what I'm talking about...

A few points:

The docs engine will paste this examples verbatim, so none of the xml docstrings will be parsed and the F12 for cref won't work as you hoped them to.

If there's a way to put these in the section for IDataView we can get all the xml features like cref. But you're writing a story that's attached to specific code blocks, so lumping it all in would probably break the story.

This is ideal to be written as mixed xml + code pages like below but those pages are editorials on a different repo and are manually updated.
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/inspect-intermediate-data-ml-net

Having said all that, I think this works for V1 as is, with the only caveat that the xml features like cref won't work as you intended.

In reply to: 274705835 [](ancestors = 274705835)

Very good points @shmoradims. Thanks for keeping track of this. Fair enough. We can think about that later. TBH I find the documentation system perhaps more confusing than other parts of our infrastructure, so I'm grateful to have your guidance in this area.

I'm still not sure why the <see tags wouldn't work if someone were to try to run the sample in VS -- certainly we reference types outside of our own library all the time, e.g., lots of things in the BCL -- but we can try as you say to work that out later.

TomFinley · 2019-04-11T23:11:42Z

docs/samples/Microsoft.ML.Samples/Microsoft.ML.Samples.csproj

@@ -5,6 +5,7 @@
    <OutputType>Exe</OutputType>
    <SignAssembly>false</SignAssembly>
    <PublicSign>false</PublicSign>
+    <RootNamespace>Samples</RootNamespace>


Thought I might opportunistically do this, since our new policy per #3205 is that samples not have Microsoft.ML in their namespace, but by default we were still doing the "wrong" thing when we create a new file. I think it might be nice if we avoided the problem by default.

TomFinley · 2019-04-11T23:12:34Z

Not sure offhand who the typical sample reviewers are... @shmoradims and @sfilipi are obvious, but beyond that not entirely certain. Maybe @zeahmed ?

docs/samples/Microsoft.ML.Samples/Dynamic/SimpleDataViewImplementation.cs

codecov · 2019-04-11T23:54:14Z

Codecov Report

Merging #3302 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3302      +/-   ##
==========================================
+ Coverage   72.63%   72.63%   +<.01%     
==========================================
  Files         807      807              
  Lines      145127   145127              
  Branches    16219    16219              
==========================================
+ Hits       105413   105417       +4     
+ Misses      35296    35293       -3     
+ Partials     4418     4417       -1

Flag	Coverage Δ
#Debug	`72.63% <ø> (ø)`	⬆️
#production	`68.17% <ø> (ø)`	⬆️
#test	`88.95% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
src/Microsoft.ML.Transforms/Text/LdaTransform.cs	`89.89% <0%> (+0.62%)`	⬆️

codecov · 2019-04-11T23:54:32Z

Codecov Report

Merging #3302 into master will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3302      +/-   ##
==========================================
+ Coverage   72.63%   72.65%   +0.01%     
==========================================
  Files         807      807              
  Lines      145127   145190      +63     
  Branches    16219    16223       +4     
==========================================
+ Hits       105413   105485      +72     
+ Misses      35296    35290       -6     
+ Partials     4418     4415       -3

Flag	Coverage Δ
#Debug	`72.65% <ø> (+0.01%)`	⬆️
#production	`68.17% <ø> (ø)`	⬆️
#test	`88.97% <ø> (+0.02%)`	⬆️

Impacted Files	Coverage Δ
src/Microsoft.ML.DataView/IDataView.cs	`100% <ø> (ø)`	⬆️
src/Microsoft.ML.Recommender/RecommenderCatalog.cs	`70.83% <0%> (ø)`	⬆️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs	`84.9% <0%> (+0.2%)`	⬆️
...StandardTrainers/Standard/LinearModelParameters.cs	`60.31% <0%> (+0.26%)`	⬆️
...ests/TrainerEstimators/MatrixFactorizationTests.cs	`97.84% <0%> (+0.43%)`	⬆️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs	`89.89% <0%> (+0.62%)`	⬆️
src/Microsoft.ML.Maml/MAML.cs	`26.21% <0%> (+1.45%)`	⬆️

artidoro · 2019-04-11T23:57:19Z

Looks great! Is this something that we want to link in our API docs? Or should it be somewhere else in the ML.NET documentation website?
I am thinking it could be nice to have this as an example under the IDataView interface, but maybe there are better places or ways to include it in the documentation.

artidoro

TomFinley · 2019-04-12T01:12:30Z

Looks great! Is this something that we want to link in our API docs? Or should it be somewhere else in the ML.NET documentation website?
I am thinking it could be nice to have this as an example under the IDataView interface, but maybe there are better places or ways to include it in the documentation.

Sure @artidoro, I can try to do that... I don't actually know how, so I just did some copy-pasta from one of @rogancarr's PRs and adapted it, let me know if you get a chance whether my IDataView.cs changes in next commit look good.

sfilipi · 2019-04-12T05:43:27Z

docs/samples/Microsoft.ML.Samples/Dynamic/SimpleDataViewImplementation.cs

+    /// to create pre-baked implementations, it is also useful to know how to create one completely from scratch. We also
+    /// take this opportunity to illustrate and motivate the basic principles of how the IDataView system is architected,
+    /// since people interested in implementing <see cref="IDataView"/> need at least some knowledge of those principles.
+    /// </summary>


we're using simple, non-xml comments since the sample displays as is.

+1

In reply to: 274765737 [](ancestors = 274765737)

But why? Maybe we've had wildly difference experiences, but I've generally observed that people don't read samples as an intellectual exercise from beginning to end, they might read a little, but immediately fall to running them and see what the heck is going on. At least, such has been my observation. It's definitely true for me.

But, you're the expert, but I don't know that I like this idea of making the documentation non-browsable if someone actually tries to run the sample. Perhaps that's what you're doing, but I might prefer to have some understanding of why it's actually a good idea.

sfilipi · 2019-04-12T05:49:16Z

docs/samples/Microsoft.ML.Samples/Dynamic/SimpleDataViewImplementation.cs

+                // while `tokensValue` is logically presented as a three element array, internally you will
+                // see that the arrays internal to that structure have (at least) four items, specifically:
+                // `Masterfully`, `done`, `hero!`, `listen.`. In this way we see a simple example of the details
+                // of how buffer sharing from one iteration to the next actually works.


next actually works [](start = 66, length = 20)

do you want to add a line about the Length.

You mean describe that Masterfully, done, hero! has three items, but Stay, awhile, and, listen has four items? I know I usually err on the side of overdescription so this may seem a bit out of character for me to say but, I kind of feel like this particular point is so completely obvious that including it explicitly in the narrative I've structured serves more to de-focus that narrative than clarify it. But maybe you mean something else?

shmoradims

Simple IDataView implementation sample.

309c118

TomFinley added the documentation Related to documentation of ML.NET label Apr 11, 2019

TomFinley requested review from eerhardt, artidoro, shmoradims and sfilipi April 11, 2019 23:06

TomFinley self-assigned this Apr 11, 2019

TomFinley commented Apr 11, 2019

View reviewed changes

artidoro reviewed Apr 11, 2019

View reviewed changes

docs/samples/Microsoft.ML.Samples/Dynamic/SimpleDataViewImplementation.cs Outdated Show resolved Hide resolved

artidoro reviewed Apr 11, 2019

View reviewed changes

docs/samples/Microsoft.ML.Samples/Dynamic/SimpleDataViewImplementation.cs Outdated Show resolved Hide resolved

artidoro reviewed Apr 11, 2019

View reviewed changes

docs/samples/Microsoft.ML.Samples/Dynamic/SimpleDataViewImplementation.cs Outdated Show resolved Hide resolved

artidoro reviewed Apr 11, 2019

View reviewed changes

docs/samples/Microsoft.ML.Samples/Dynamic/SimpleDataViewImplementation.cs Show resolved Hide resolved

artidoro approved these changes Apr 11, 2019

View reviewed changes

Artidoro review comments.

c7ae37f

sfilipi reviewed Apr 12, 2019

View reviewed changes

shmoradims approved these changes Apr 12, 2019

View reviewed changes

TomFinley merged commit 326727f into dotnet:master Apr 12, 2019

TomFinley deleted the SimpleDVSample branch April 12, 2019 18:09

TomFinley added a commit to TomFinley/machinelearning that referenced this pull request Apr 12, 2019

Simple IDataView implementation sample. (dotnet#3302)

2388735

ghost locked as resolved and limited conversation to collaborators Mar 22, 2022

Simple IDataView implementation sample. #3302

Simple IDataView implementation sample. #3302

Uh oh!

Conversation

TomFinley commented Apr 11, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shmoradims Apr 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomFinley commented Apr 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 11, 2019

Codecov Report

Uh oh!

codecov bot commented Apr 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

artidoro commented Apr 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artidoro left a comment

Choose a reason for hiding this comment

Uh oh!

TomFinley commented Apr 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shmoradims left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shmoradims Apr 12, 2019 •

edited

Loading

TomFinley commented Apr 11, 2019 •

edited

Loading

codecov bot commented Apr 11, 2019 •

edited

Loading

artidoro commented Apr 11, 2019 •

edited

Loading