Skip to content

ML.NET Notes #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 8, 2018
Merged

ML.NET Notes #74

merged 2 commits into from
Nov 8, 2018

Conversation

terrajobst
Copy link
Contributor

No description provided.

@TomFinley
Copy link

TomFinley commented Nov 5, 2018

Hi @terrajobst , thanks for posting this. In order to track the correspondence of work to these notes, I will mark as comments on the specific lines corresponding issues (or comments on issues, where appropriate), where they are posted.

If you would prefer in future I provide these annotations in a different way, please let me know, thanks!


## Notes

* `Column` can be a struct
Copy link

@TomFinley TomFinley Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as a comment here, to address this and some other commetns.

way, but it's hard to make the much simpler as they would have to be
generic or the user would have to handle the type
* `IDataView`
- `ISchematized` is being removed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific line entered as 1502.

generic or the user would have to handle the type
* `IDataView`
- `ISchematized` is being removed
- Should be an abstract class

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line item entered as 1528.

implementers can still return `null`.
- Either make it a property `long? Count` or make it
`bool TryGetCount(out long count)`.
- `GetRowCursor()` instead of taking `Func<int, bool>` make it `Predicate<Column>`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line item entered as 1529.

* `IDataView`
- `ISchematized` is being removed
- Should be an abstract class
- `GetRowCount()`. `lazy = true` means get if you know it, otherwise don't

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line item entered as 1531.

which makes it easier to reason about.
- We should look into whether we should have an async version
- We want to get rid of the the `IRowCursorConsolidator` base
* `ICounted`
Copy link

@TomFinley TomFinley Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of the things like this related to IRowCursor I have bundled into a single issue 1532 since I felt like this refactoring is something worth considering collectively, rather than individual items, since many of them are related. This actually includes all the rest of the file up to the ColumnType item.

... Except for the UInt128 issue now that I see it.

* `IRowToRowMapper`
- `Action disposer` should probably be replaced by making the right type
(`IRow`?) implement `IDisposable`.
- `Func<int, bool>` is hard to get right. Should this be `Predicate<Column>`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested this as part of 1529, which I'd mentioned previously.

as well?
* Can we remove `DataKind` entirely? If not, we should probably rename some of the
members, remove the overlap, and add an entry for `0`, like `Unknown` or `None`.
* `ColumnType`
Copy link

@TomFinley TomFinley Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entered as 1533, with some expansion, please take a look and tell me what you think. This includes a solution to the DataKind problem that I think will work pretty well and at fairly little trouble to anyone...

@terrajobst
Copy link
Contributor Author

terrajobst commented Nov 7, 2018

Hi @terrajobst , thanks for posting this. In order to track the correspondence of work to these notes, I will mark as comments on the specific lines corresponding issues (or comments on issues, where appropriate), where they are posted.

These notes are mostly published to help us (to remeber what happened in the meeting) and the community (to see what's going on). Commenting here is a fine. We could also file work items and link them from the notes, notes but that might be overkill. So whatever helps you stay organized is The Correct Process ™️ 😄

@terrajobst terrajobst merged commit 2215329 into master Nov 8, 2018
@terrajobst terrajobst deleted the ml-net branch November 8, 2018 21:44
- `GetRowCursor()` instead of taking `Func<int, bool>` make it `Predicate<Column>`
which makes it easier to reason about.
- We should look into whether we should have an async version
- We want to get rid of the the `IRowCursorConsolidator` base

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost forgot about this one. Entered as 1867.

* `IRowCursor` extending `IRow` seems odd. Can these be collapsed? If not, can
they have an inheritance relationship when we're making them classes?
* `IRowToRowMapper`
- `Action disposer` should probably be replaced by making the right type

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions about disposability of IRow (or now Row) were fixed in 1867 so this strangeness around disposing methods has been resolved, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants