-
Notifications
You must be signed in to change notification settings - Fork 1.9k
V1 Scenarios need to be covered by tests #2498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So we aren't going to support training from an IEnumerable (directly) backed by an EF DbContext in V1? If so better note that somewhere, because it works up to a point and then fails in a confusing way. |
@endintiers For V1, we will only support training with IDataView, but that should still be possible with an IEnumerable backed by an EF DbContext. (@singlis and @Ivanidzo4ka) Would you mind giving an example of what you've been doing and how it's been failing? |
#2159 for more details.
I believe setting conc to 1 for mlContext should help, but it need verification. |
Adding SQL test back — I had misunderstood the requirements, and it looks like we can use an EF DbContext for it. |
Tar. I have looked at how to modify mlContext.CreateStreamingDataView to be able to detect? and create an IDataView that could signal 'single-threaded source' downstream. This could be done (it's just the sync of buffer re-loads that is an issue). Given the time release-wise though just setting conc to 1 is a good move. Sadly this will slow training (on serious datasets with many available CPUs). I should volunteer to do at least this test... (using EF Core In-Memory). In the real world generating text files from the DB and training on them instead seems to be the best move. |
I have a sample that reads data from a SQL database. I can create another one using one of the data sets used in the samples. Would the connection string be hidden from the sample since it's not necessary for sample? |
@jwood803 You should load the (textfile?) dataset into an in-memory database provider such as Microsoft.EntityFrameworkCore.InMemory. These are functionally equivalent to real DB providers and are used to build DB tests. You won't need a connection string. |
In issue #584, we laid out a set of scenarios that we'd like to cover for V1.0 of ML.NET. We need high-level functional tests to make sure that these work well in the 1.0 library.
Here is a list of tests that cover the scenarios. Let's use this issue as a top-level issue to track coverage of the APIs.
"Help I'm a bug!"
I should be able to see the steps where it is normalized to"help i'm a bug"
then tokenized into["help", "i'm", "a", "bug"]
then mapped into term numbers[203, 25, 3, 511]
then projected into the sparse float vector{3:1, 25:1, 203:1, 511:1}
, etc. etc.The text was updated successfully, but these errors were encountered: