-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Sample fails with "The size of input lines is not consistent" #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Some notes:
@isaacabraham - are you on Windows? If so, can you open the imdb_labelled.txt file in VS and "normalize" the line endings to |
The other interesting thing I've noted about this data set is that it has 6 formatting errors:
Looking at the data - there are unmatched double quotes
It is working for me on .NET Core 2.0. Another thing to try is to scrub these 6 formatting errors out of the file by removing the unmatched double quotes. |
Yes, Windows here! I've just tried it in VS2017 (previously I was using Code) and have normalised the line endings. Now I get a completely different error:
Is there any runtime reflection / lookups for this stuff? It all compiles in the script file - just when that Train method is called, it goes pop. |
Note - even with the normalised file I still get that error in Code. |
(Sorry, I'm not even a novice in F#) Can you show what is in |
Yes, ML.NET uses a "catalog" of components, which are discovered and invoked using reflection. See machinelearning/src/Microsoft.ML.Core/ComponentModel/ComponentCatalog.cs Lines 399 to 414 in c023727
|
You can set "allowQuotedStrings = false" in TextLoader. I see that the text columns are not quoted for every example except for a few. This causes "The size of input lines is not consistent" error sometime. |
@zeahmed Thanks - unfortunately changing to that gives a different error: @eerhardt no problem. The file is generated by Paket to load in all the assemblies required as dependencies from the ML library. Here's what it contains:
|
I also see the Warnings, which are expected based on dotnet/docs#5256 (comment). I was wondering if there is there a way not to show the Warnings on the console? |
In C# does this sample then work? Or is it the same issue with the sample data file? |
yes, a working example is here: dotnet/docs#5330 |
Small update here. I have managed to get this working within a console application by also removing the use of records and replacing them with mutable classes. This is - from an F# perspective - undesirable but at least it's a starting point. I'm still unable to get it to work from a script however, which is very important in my opinion from an data analysis point of view (@mathias-brandewinder can probably elaborate the rationale on why this is better than I. Or probably any Python machine learning person...). The error I'm now seeing is:
|
The error you are getting is caused by the runtime not finding the "native" assemblies that are used by ML.NET. These assemblies are in the NuGet package under the We had a similar problem as above when using I don't have any real experience with the F# scripting tooling. How does it normally handle native (C++) assemblies contained in a NuGet package? If there is something we can/should do in the NuGet package? Or are native assemblies from a NuGet package not supported in F# scripting? |
@eerhardt That helped, and I have it working now. There are a few ways of doing this - the issue is that the F# Interactive process (FSI.exe) can't see the native dlls in any path / probing folder by default so it can't find them. F# scripts do have the ability to add a folder / path to probing using the The most "fully featured" answer I found to this was here http://christoph.ruegg.name/blog/loading-native-dlls-in-fsharp-interactive.html. By adding the path to the native dlls before running the model, I got it to work i.e. open System
let nativeDirectory = @"C:\Users\Isaac\.nuget\packages\microsoft.ml\0.1.0\runtimes\win-x64\native"
Environment.SetEnvironmentVariable("Path", Environment.GetEnvironmentVariable("Path") + ";" + nativeDirectory) Unfortunately this is not especially easy to figure out. I've seen a similar issue recently with CosmosDB using some native assemblies - they aren't particularly easy to work with. Regarding NuGet etc. - the main NuGet tooling is, to be honest, a dead loss from the point of F# scripting - you need some form of msbuild project file to mark your dependencies, and there's no easy way to reference the assemblies anyway, which is one of the reasons why many F# developers use Paket instead. Paket already supports the ability to generate a "load dependencies" file for scripts (as seen in my earlier post here) but it doesn't know about native dlls. @forki do you think that this is something that could be added to Paket's generate load scripts functionality? Are native folders a "proper" thing in NuGet packages? |
Check out https://docs.microsoft.com/en-us/nuget/create-packages/supporting-multiple-target-frameworks#architecture-specific-folders for the docs on the
|
@eerhardt is there any way not to have to fall back to these native dlls? |
Currently, no, the native assemblies are required. However, we are exploring/thinking of other options here. The CpuMath assembly is written in C++ because it wants to use SIMD instructions, which were only available in C/C++. With .NET Core 2.1, these SIMD instructions are available through .NET APIs. We could replace the CpuMath assembly with .NET code that uses the same instructions. On .NET Framework, we would still require the native assembly in order to use the SIMD instructions, because this support is only for .NET Core. Another option/thought here is to provide software fallback methods, which of course would be slower. But the advantage is that it would have wider reach where the SIMD instructions aren't available (for example on ARM processors). |
Please tag this with "F#" (though it might not be specifically related to F#) |
I'm trying out the sample shown here. However, whenever I try to train the model I get an error: "The size of input lines is not consistent". This is using the exact files that are specified in the tutorial so I'm not sure where I'm going wrong - any ideas?
The text was updated successfully, but these errors were encountered: