Skip to content

Simple IDataView implementation sample. #3302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 12, 2019

Conversation

TomFinley
Copy link
Contributor

Fixes #3301.

@TomFinley TomFinley added the documentation Related to documentation of ML.NET label Apr 11, 2019
@TomFinley TomFinley self-assigned this Apr 11, 2019
}

/// <summary>
/// This is an implementation of <see cref="IDataView"/> that wraps an <see cref="IEnumerable{T}"/>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my first attempt at this, I had written much of this material as separate XML docs for each member, but I found that this made it hard to tell a connected "story" about what is going on here, so I stuck to one big block on the IDataView implementation and one block on the DataViewRowCursor derived class. This made it easier to tell a cohesive story, while also having it I think look a lot less busy.

They are XML comments so I can use <see tags, mostly so that someone inspecting this sample can sometimes F12 to see what I'm talking about...

Copy link

@shmoradims shmoradims Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few points:

  1. The docs engine will paste this examples verbatim, so none of the xml docstrings will be parsed and the F12 for cref won't work as you hoped them to.

  2. If there's a way to put these in the section for IDataView we can get all the xml features like cref. But you're writing a story that's attached to specific code blocks, so lumping it all in would probably break the story.

  3. This is ideal to be written as mixed xml + code pages like below but those pages are editorials on a different repo and are manually updated.
    https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/inspect-intermediate-data-ml-net

Having said all that, I think this works for V1 as is, with the only caveat that the xml features like cref won't work as you intended.


In reply to: 274705835 [](ancestors = 274705835)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good points @shmoradims. Thanks for keeping track of this. Fair enough. We can think about that later. TBH I find the documentation system perhaps more confusing than other parts of our infrastructure, so I'm grateful to have your guidance in this area.

I'm still not sure why the <see tags wouldn't work if someone were to try to run the sample in VS -- certainly we reference types outside of our own library all the time, e.g., lots of things in the BCL -- but we can try as you say to work that out later.

@@ -5,6 +5,7 @@
<OutputType>Exe</OutputType>
<SignAssembly>false</SignAssembly>
<PublicSign>false</PublicSign>
<RootNamespace>Samples</RootNamespace>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I might opportunistically do this, since our new policy per #3205 is that samples not have Microsoft.ML in their namespace, but by default we were still doing the "wrong" thing when we create a new file. I think it might be nice if we avoided the problem by default.

@TomFinley
Copy link
Contributor Author

TomFinley commented Apr 11, 2019

Not sure offhand who the typical sample reviewers are... @shmoradims and @sfilipi are obvious, but beyond that not entirely certain. Maybe @zeahmed ?

@codecov
Copy link

codecov bot commented Apr 11, 2019

Codecov Report

Merging #3302 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3302      +/-   ##
==========================================
+ Coverage   72.63%   72.63%   +<.01%     
==========================================
  Files         807      807              
  Lines      145127   145127              
  Branches    16219    16219              
==========================================
+ Hits       105413   105417       +4     
+ Misses      35296    35293       -3     
+ Partials     4418     4417       -1
Flag Coverage Δ
#Debug 72.63% <ø> (ø) ⬆️
#production 68.17% <ø> (ø) ⬆️
#test 88.95% <ø> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.89% <0%> (+0.62%) ⬆️

@codecov
Copy link

codecov bot commented Apr 11, 2019

Codecov Report

Merging #3302 into master will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3302      +/-   ##
==========================================
+ Coverage   72.63%   72.65%   +0.01%     
==========================================
  Files         807      807              
  Lines      145127   145190      +63     
  Branches    16219    16223       +4     
==========================================
+ Hits       105413   105485      +72     
+ Misses      35296    35290       -6     
+ Partials     4418     4415       -3
Flag Coverage Δ
#Debug 72.65% <ø> (+0.01%) ⬆️
#production 68.17% <ø> (ø) ⬆️
#test 88.97% <ø> (+0.02%) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.DataView/IDataView.cs 100% <ø> (ø) ⬆️
src/Microsoft.ML.Recommender/RecommenderCatalog.cs 70.83% <0%> (ø) ⬆️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.9% <0%> (+0.2%) ⬆️
...StandardTrainers/Standard/LinearModelParameters.cs 60.31% <0%> (+0.26%) ⬆️
...ests/TrainerEstimators/MatrixFactorizationTests.cs 97.84% <0%> (+0.43%) ⬆️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.89% <0%> (+0.62%) ⬆️
src/Microsoft.ML.Maml/MAML.cs 26.21% <0%> (+1.45%) ⬆️

@artidoro
Copy link
Contributor

artidoro commented Apr 11, 2019

Looks great! Is this something that we want to link in our API docs? Or should it be somewhere else in the ML.NET documentation website?
I am thinking it could be nice to have this as an example under the IDataView interface, but maybe there are better places or ways to include it in the documentation.

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley
Copy link
Contributor Author

Looks great! Is this something that we want to link in our API docs? Or should it be somewhere else in the ML.NET documentation website?
I am thinking it could be nice to have this as an example under the IDataView interface, but maybe there are better places or ways to include it in the documentation.

Sure @artidoro, I can try to do that... I don't actually know how, so I just did some copy-pasta from one of @rogancarr's PRs and adapted it, let me know if you get a chance whether my IDataView.cs changes in next commit look good.

/// to create pre-baked implementations, it is also useful to know how to create one completely from scratch. We also
/// take this opportunity to illustrate and motivate the basic principles of how the IDataView system is architected,
/// since people interested in implementing <see cref="IDataView"/> need at least some knowledge of those principles.
/// </summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're using simple, non-xml comments since the sample displays as is.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


In reply to: 274765737 [](ancestors = 274765737)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why? Maybe we've had wildly difference experiences, but I've generally observed that people don't read samples as an intellectual exercise from beginning to end, they might read a little, but immediately fall to running them and see what the heck is going on. At least, such has been my observation. It's definitely true for me.

But, you're the expert, but I don't know that I like this idea of making the documentation non-browsable if someone actually tries to run the sample. Perhaps that's what you're doing, but I might prefer to have some understanding of why it's actually a good idea.

// while `tokensValue` is logically presented as a three element array, internally you will
// see that the arrays internal to that structure have (at least) four items, specifically:
// `Masterfully`, `done`, `hero!`, `listen.`. In this way we see a simple example of the details
// of how buffer sharing from one iteration to the next actually works.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next actually works [](start = 66, length = 20)

do you want to add a line about the Length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean describe that Masterfully, done, hero! has three items, but Stay, awhile, and, listen has four items? I know I usually err on the side of overdescription so this may seem a bit out of character for me to say but, I kind of feel like this particular point is so completely obvious that including it explicitly in the narrative I've structured serves more to de-focus that narrative than clarify it. But maybe you mean something else?

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley TomFinley merged commit 326727f into dotnet:master Apr 12, 2019
@TomFinley TomFinley deleted the SimpleDVSample branch April 12, 2019 18:09
TomFinley added a commit to TomFinley/machinelearning that referenced this pull request Apr 12, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simple IDataView Sample
4 participants