Skip to content

Failure while using WordEmbeddings [Regression] #873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Anipik opened this issue Sep 10, 2018 · 3 comments
Closed

Failure while using WordEmbeddings [Regression] #873

Anipik opened this issue Sep 10, 2018 · 3 comments

Comments

@Anipik
Copy link
Contributor

Anipik commented Sep 10, 2018

I am using Maml.MainAll(string cmd) for producing the benchmark results.

The command that I tried is

maml.exe CV 
tr=OVA{p=AveragedPerceptron{iter=10}}
k=5 
loader=TextLoader{quote=- sparse=- col=Label:R4:0 col=rev_id:TX:1 col=comment:TX:2 col=logged_in:BL:4 col=ns:TX:5 col=sample:TX:6 col=split:TX:7 col=year:R4:3 header=+}
data=c:\git\machinelearning\bin\AnyCPU.Release\Microsoft.ML.Benchmarks\netcoreapp2.1\491da934-573a-4301-bb13-570ecee3aed2\bin\Release\netcoreapp2.1\external\WikiDetoxAnnotated160kRows.tsv 
xf=Convert{col=logged_in type=R4} xf=CategoricalTransform{col=ns} xf=TextTransform
{col=FeaturesText:comment tokens=+ wordExtractor=NGramExtractorTransform{ngram=2}} 
xf=WordEmbeddingsTransform{col=FeaturesWordEmbedding:FeaturesText_TransformedText 
model=FastTextWikipedia300D} xf=Concat
{col=Features:FeaturesText,FeaturesWordEmbedding,logged_in,ns}

The error I get is

--- Command line args ---
CV tr=OVA{p=AveragedPerceptron{iter=10}} k=5 loader=TextLoader{quote=- sparse=- col=Label:R4:0 col=rev_id:TX:1 col=comment:TX:2 col=logged_in:BL:4 col=ns:TX:5 col=sample:TX:6 col=split:TX:7 col=year:R4:3 header=+} data= c:\git\machinelearning\bin\AnyCPU.Release\Microsoft.ML.Benchmarks\netcoreapp2.1\95c44c13-fcff-45f8-8427-370a4bb30f47\bin\Release\netcoreapp2.1\external\WikiDetoxAnnotated160kRows.tsv xf=Convert{col=logged_in type=R4} xf=CategoricalTransform{col=ns} xf=TextTransform{col=FeaturesText:comment tokens=+ wordExtractor=NGramExtractorTransform{ngram=2}} xf=WordEmbeddingsTransform{col=FeaturesWordEmbedding:FeaturesText_TransformedText model=FastTextWikipedia300D} xf=Concat{col=Features:FeaturesText,FeaturesWordEmbedding,logged_in,ns}
--- Exception message ---
(1) Unexpected exception: One or more errors occurred. (Exception has been thrown by the target of an invocation.) (Exception has been thrown by the target of an invocation.) (Exception has been thrown by the target of an invocation.) (Exception has been thrown by the target of an invocation.) (Exception has been thrown by the target of an invocation.), 'System.AggregateException'
   at System.Threading.Tasks.Task.WaitAllCore(Task[] tasks, Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at Microsoft.ML.Runtime.Data.CrossValidationCommand.FoldHelper.GetCrossValidationTasks() in c:\git\machinelearning\src\Microsoft.ML.Data\Commands\CrossValidationCommand.cs:line 503
   at Microsoft.ML.Runtime.Data.CrossValidationCommand.RunCore(IChannel ch, String cmd) in c:\git\machinelearning\src\Microsoft.ML.Data\Commands\CrossValidationCommand.cs:line 200
   at Microsoft.ML.Runtime.Data.CrossValidationCommand.Run() in c:\git\machinelearning\src\Microsoft.ML.Data\Commands\CrossValidationCommand.cs:line 130
   at Microsoft.ML.Runtime.Tools.Maml.MainCore(TlcEnvironment env, String args, Boolean alwaysPrintStacktrace) in c:\git\machinelearning\src\Microsoft.ML.Maml\MAML.cs:line 140
(2) Unexpected exception: Exception has been thrown by the target of an invocation., 'System.Reflection.TargetInvocationException'
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(Object[] ctorArgs) in c:\git\machinelearning\src\Microsoft.ML.Core\ComponentModel\ComponentCatalog.cs:line 212
   at Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance[TRes](IHostEnvironment env, Type signatureType, TRes& result, String name, String options, Object[] extra) in c:\git\machinelearning\src\Microsoft.ML.Core\ComponentModel\ComponentCatalog.cs:line 977
   at Microsoft.ML.Runtime.ComponentCatalog.CreateInstance[TRes](IHostEnvironment env, Type signatureType, String name, String options, Object[] extra) in c:\git\machinelearning\src\Microsoft.ML.Core\ComponentModel\ComponentCatalog.cs:line 901
   at Microsoft.ML.Runtime.CommandLine.CmdParser.ComponentFactoryFactory.ComponentFactory`2.CreateComponent(IHostEnvironment env, TArg1 argument1) in c:\git\machinelearning\src\Microsoft.ML.Core\CommandLine\CmdParser.cs:line 2659
   at Microsoft.ML.Runtime.Data.CrossValidationCommand.CreateRoleMappedData(IHostEnvironment env, IChannel ch, IDataView data, ITrainer trainer) in c:\git\machinelearning\src\Microsoft.ML.Data\Commands\CrossValidationCommand.cs:line 265
   at Microsoft.ML.Runtime.Data.CrossValidationCommand.FoldHelper.RunFold(Int32 fold) in c:\git\machinelearning\src\Microsoft.ML.Data\Commands\CrossValidationCommand.cs:line 524
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
(3) Unexpected exception: Source array was not long enough. Check the source index, length, and the array's lower bounds.
Parameter name: sourceArray, 'System.ArgumentException'
   at System.Array.Copy(Array sourceArray, Int32 sourceIndex, Array destinationArray, Int32 destinationIndex, Int32 length, Boolean reliable)
   at Microsoft.ML.Runtime.Internal.Utilities.BigArray`1.AddRange(T[] src, Int32 length) in c:\git\machinelearning\src\Microsoft.ML.Core\Utilities\BigArray.cs:line 337
   at Microsoft.ML.Runtime.Data.WordEmbeddingsTransform.GetVocabularyDictionary() in c:\git\machinelearning\src\Microsoft.ML.Transforms\Text\WordEmbeddingsTransform.cs:line 436
   at Microsoft.ML.Runtime.Data.WordEmbeddingsTransform..ctor(IHostEnvironment env, Arguments args, IDataView input) in c:\git\machinelearning\src\Microsoft.ML.Transforms\Text\WordEmbeddingsTransform.cs:line 154

I also tried it using Microsoft.console proj but I get similar results.

longer call stack

 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.Contracts.DbgFailCore(string msg, Microsoft.ML.Runtime.IExceptionContext ctx) Line 772	C#
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.Contracts.DbgFail(Microsoft.ML.Runtime.IExceptionContext ctx) Line 779	C#
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.Contracts.Assert(Microsoft.ML.Runtime.IExceptionContext ctx, bool f) Line 834	C#
>	Microsoft.ML.Transforms.dll!Microsoft.ML.Runtime.Data.WordEmbeddingsTransform.Model.AddWordVector(Microsoft.ML.Runtime.IChannel ch, string word, float[] wordVector) Line 101	C#
 	Microsoft.ML.Transforms.dll!Microsoft.ML.Runtime.Data.WordEmbeddingsTransform.GetVocabularyDictionary() Line 433	C#
 	Microsoft.ML.Transforms.dll!Microsoft.ML.Runtime.Data.WordEmbeddingsTransform.WordEmbeddingsTransform(Microsoft.ML.Runtime.IHostEnvironment env, Microsoft.ML.Runtime.Data.WordEmbeddingsTransform.Arguments args, Microsoft.ML.Runtime.Data.IDataView input) Line 154	C#
 	[Native to Managed Transition]	
 	[Managed to Native Transition]	
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(object[] ctorArgs) Line 209	C#
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstance(Microsoft.ML.Runtime.IHostEnvironment env, object args, object[] extra) Line 233	C#
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance<Microsoft.ML.Runtime.Data.IDataTransform>(Microsoft.ML.Runtime.IHostEnvironment env, System.Type signatureType, out Microsoft.ML.Runtime.Data.IDataTransform result, string name, string options, object[] extra) Line 977	C#
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.ComponentCatalog.CreateInstance<Microsoft.ML.Runtime.Data.IDataTransform>(Microsoft.ML.Runtime.IHostEnvironment env, System.Type signatureType, string name, string options, object[] extra) Line 901	C#
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.CommandLine.CmdParser.ComponentFactoryFactory.ComponentFactory<Microsoft.ML.Runtime.Data.IDataView, Microsoft.ML.Runtime.Data.IDataTransform>.CreateComponent(Microsoft.ML.Runtime.IHostEnvironment env, Microsoft.ML.Runtime.Data.IDataView argument1) Line 2659	C#
 	Microsoft.ML.Data.dll!Microsoft.ML.Runtime.Data.CrossValidationCommand.CreateRoleMappedData(Microsoft.ML.Runtime.IHostEnvironment env, Microsoft.ML.Runtime.IChannel ch, Microsoft.ML.Runtime.Data.IDataView data, Microsoft.ML.Runtime.ITrainer trainer) Line 266	C#
 	Microsoft.ML.Data.dll!Microsoft.ML.Runtime.Data.CrossValidationCommand.FoldHelper.RunFold(int fold) Line 524	C#
 	Microsoft.ML.Data.dll!Microsoft.ML.Runtime.Data.CrossValidationCommand.FoldHelper.GetCrossValidationTasks.AnonymousMethod__0() Line 494	C#
 	System.Private.CoreLib.dll!System.Threading.Tasks.Task<Microsoft.ML.Runtime.Data.CrossValidationCommand.FoldHelper.FoldResult>.InnerInvoke() Line 621	C#
 	System.Private.CoreLib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 145	C#
 	System.Private.CoreLib.dll!System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref System.Threading.Tasks.Task currentTaskSlot) Line 2454	C#
 	System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() Line 582	C#

It works perfectly fine with TLC 3.10 . I ran the same command using that version and it runs perfectly fine

Debugging

The array that is being getting created is of size 1 https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Transforms/Text/WordEmbeddingsTransform.cs#L431

array[0] = 300 //dimension of the word vector being downloaded which leads to failure later on.
There is no change in the file from the internal repo

cc @danmosemsft @sfilipi @eerhardt @shauheen

@Anipik
Copy link
Contributor Author

Anipik commented Sep 10, 2018

https://github.com/Anipik/machinelearning/pull/1

I added the dimension check here similar.
@sfilipi @Ivanidzo4ka

@eerhardt
Copy link
Member

Can you check if this is failing because some model isn't available on https://aka.ms/tlc-resources/?

@Anipik
Copy link
Contributor Author

Anipik commented Sep 11, 2018

i can check the logs and fiddler if it is due to some missing resources but its not likely as its working after i added the dimension check #880 (comment) and with Maml.exe app

@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants