Skip to content

Failure while using WordEmbeddings [Regression] #873

Closed
@Anipik

Description

@Anipik

I am using Maml.MainAll(string cmd) for producing the benchmark results.

The command that I tried is

maml.exe CV 
tr=OVA{p=AveragedPerceptron{iter=10}}
k=5 
loader=TextLoader{quote=- sparse=- col=Label:R4:0 col=rev_id:TX:1 col=comment:TX:2 col=logged_in:BL:4 col=ns:TX:5 col=sample:TX:6 col=split:TX:7 col=year:R4:3 header=+}
data=c:\git\machinelearning\bin\AnyCPU.Release\Microsoft.ML.Benchmarks\netcoreapp2.1\491da934-573a-4301-bb13-570ecee3aed2\bin\Release\netcoreapp2.1\external\WikiDetoxAnnotated160kRows.tsv 
xf=Convert{col=logged_in type=R4} xf=CategoricalTransform{col=ns} xf=TextTransform
{col=FeaturesText:comment tokens=+ wordExtractor=NGramExtractorTransform{ngram=2}} 
xf=WordEmbeddingsTransform{col=FeaturesWordEmbedding:FeaturesText_TransformedText 
model=FastTextWikipedia300D} xf=Concat
{col=Features:FeaturesText,FeaturesWordEmbedding,logged_in,ns}

The error I get is

--- Command line args ---
CV tr=OVA{p=AveragedPerceptron{iter=10}} k=5 loader=TextLoader{quote=- sparse=- col=Label:R4:0 col=rev_id:TX:1 col=comment:TX:2 col=logged_in:BL:4 col=ns:TX:5 col=sample:TX:6 col=split:TX:7 col=year:R4:3 header=+} data= c:\git\machinelearning\bin\AnyCPU.Release\Microsoft.ML.Benchmarks\netcoreapp2.1\95c44c13-fcff-45f8-8427-370a4bb30f47\bin\Release\netcoreapp2.1\external\WikiDetoxAnnotated160kRows.tsv xf=Convert{col=logged_in type=R4} xf=CategoricalTransform{col=ns} xf=TextTransform{col=FeaturesText:comment tokens=+ wordExtractor=NGramExtractorTransform{ngram=2}} xf=WordEmbeddingsTransform{col=FeaturesWordEmbedding:FeaturesText_TransformedText model=FastTextWikipedia300D} xf=Concat{col=Features:FeaturesText,FeaturesWordEmbedding,logged_in,ns}
--- Exception message ---
(1) Unexpected exception: One or more errors occurred. (Exception has been thrown by the target of an invocation.) (Exception has been thrown by the target of an invocation.) (Exception has been thrown by the target of an invocation.) (Exception has been thrown by the target of an invocation.) (Exception has been thrown by the target of an invocation.), 'System.AggregateException'
   at System.Threading.Tasks.Task.WaitAllCore(Task[] tasks, Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at Microsoft.ML.Runtime.Data.CrossValidationCommand.FoldHelper.GetCrossValidationTasks() in c:\git\machinelearning\src\Microsoft.ML.Data\Commands\CrossValidationCommand.cs:line 503
   at Microsoft.ML.Runtime.Data.CrossValidationCommand.RunCore(IChannel ch, String cmd) in c:\git\machinelearning\src\Microsoft.ML.Data\Commands\CrossValidationCommand.cs:line 200
   at Microsoft.ML.Runtime.Data.CrossValidationCommand.Run() in c:\git\machinelearning\src\Microsoft.ML.Data\Commands\CrossValidationCommand.cs:line 130
   at Microsoft.ML.Runtime.Tools.Maml.MainCore(TlcEnvironment env, String args, Boolean alwaysPrintStacktrace) in c:\git\machinelearning\src\Microsoft.ML.Maml\MAML.cs:line 140
(2) Unexpected exception: Exception has been thrown by the target of an invocation., 'System.Reflection.TargetInvocationException'
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(Object[] ctorArgs) in c:\git\machinelearning\src\Microsoft.ML.Core\ComponentModel\ComponentCatalog.cs:line 212
   at Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance[TRes](IHostEnvironment env, Type signatureType, TRes& result, String name, String options, Object[] extra) in c:\git\machinelearning\src\Microsoft.ML.Core\ComponentModel\ComponentCatalog.cs:line 977
   at Microsoft.ML.Runtime.ComponentCatalog.CreateInstance[TRes](IHostEnvironment env, Type signatureType, String name, String options, Object[] extra) in c:\git\machinelearning\src\Microsoft.ML.Core\ComponentModel\ComponentCatalog.cs:line 901
   at Microsoft.ML.Runtime.CommandLine.CmdParser.ComponentFactoryFactory.ComponentFactory`2.CreateComponent(IHostEnvironment env, TArg1 argument1) in c:\git\machinelearning\src\Microsoft.ML.Core\CommandLine\CmdParser.cs:line 2659
   at Microsoft.ML.Runtime.Data.CrossValidationCommand.CreateRoleMappedData(IHostEnvironment env, IChannel ch, IDataView data, ITrainer trainer) in c:\git\machinelearning\src\Microsoft.ML.Data\Commands\CrossValidationCommand.cs:line 265
   at Microsoft.ML.Runtime.Data.CrossValidationCommand.FoldHelper.RunFold(Int32 fold) in c:\git\machinelearning\src\Microsoft.ML.Data\Commands\CrossValidationCommand.cs:line 524
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
(3) Unexpected exception: Source array was not long enough. Check the source index, length, and the array's lower bounds.
Parameter name: sourceArray, 'System.ArgumentException'
   at System.Array.Copy(Array sourceArray, Int32 sourceIndex, Array destinationArray, Int32 destinationIndex, Int32 length, Boolean reliable)
   at Microsoft.ML.Runtime.Internal.Utilities.BigArray`1.AddRange(T[] src, Int32 length) in c:\git\machinelearning\src\Microsoft.ML.Core\Utilities\BigArray.cs:line 337
   at Microsoft.ML.Runtime.Data.WordEmbeddingsTransform.GetVocabularyDictionary() in c:\git\machinelearning\src\Microsoft.ML.Transforms\Text\WordEmbeddingsTransform.cs:line 436
   at Microsoft.ML.Runtime.Data.WordEmbeddingsTransform..ctor(IHostEnvironment env, Arguments args, IDataView input) in c:\git\machinelearning\src\Microsoft.ML.Transforms\Text\WordEmbeddingsTransform.cs:line 154

I also tried it using Microsoft.console proj but I get similar results.

longer call stack

 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.Contracts.DbgFailCore(string msg, Microsoft.ML.Runtime.IExceptionContext ctx) Line 772	C#
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.Contracts.DbgFail(Microsoft.ML.Runtime.IExceptionContext ctx) Line 779	C#
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.Contracts.Assert(Microsoft.ML.Runtime.IExceptionContext ctx, bool f) Line 834	C#
>	Microsoft.ML.Transforms.dll!Microsoft.ML.Runtime.Data.WordEmbeddingsTransform.Model.AddWordVector(Microsoft.ML.Runtime.IChannel ch, string word, float[] wordVector) Line 101	C#
 	Microsoft.ML.Transforms.dll!Microsoft.ML.Runtime.Data.WordEmbeddingsTransform.GetVocabularyDictionary() Line 433	C#
 	Microsoft.ML.Transforms.dll!Microsoft.ML.Runtime.Data.WordEmbeddingsTransform.WordEmbeddingsTransform(Microsoft.ML.Runtime.IHostEnvironment env, Microsoft.ML.Runtime.Data.WordEmbeddingsTransform.Arguments args, Microsoft.ML.Runtime.Data.IDataView input) Line 154	C#
 	[Native to Managed Transition]	
 	[Managed to Native Transition]	
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(object[] ctorArgs) Line 209	C#
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstance(Microsoft.ML.Runtime.IHostEnvironment env, object args, object[] extra) Line 233	C#
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance<Microsoft.ML.Runtime.Data.IDataTransform>(Microsoft.ML.Runtime.IHostEnvironment env, System.Type signatureType, out Microsoft.ML.Runtime.Data.IDataTransform result, string name, string options, object[] extra) Line 977	C#
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.ComponentCatalog.CreateInstance<Microsoft.ML.Runtime.Data.IDataTransform>(Microsoft.ML.Runtime.IHostEnvironment env, System.Type signatureType, string name, string options, object[] extra) Line 901	C#
 	Microsoft.ML.Core.dll!Microsoft.ML.Runtime.CommandLine.CmdParser.ComponentFactoryFactory.ComponentFactory<Microsoft.ML.Runtime.Data.IDataView, Microsoft.ML.Runtime.Data.IDataTransform>.CreateComponent(Microsoft.ML.Runtime.IHostEnvironment env, Microsoft.ML.Runtime.Data.IDataView argument1) Line 2659	C#
 	Microsoft.ML.Data.dll!Microsoft.ML.Runtime.Data.CrossValidationCommand.CreateRoleMappedData(Microsoft.ML.Runtime.IHostEnvironment env, Microsoft.ML.Runtime.IChannel ch, Microsoft.ML.Runtime.Data.IDataView data, Microsoft.ML.Runtime.ITrainer trainer) Line 266	C#
 	Microsoft.ML.Data.dll!Microsoft.ML.Runtime.Data.CrossValidationCommand.FoldHelper.RunFold(int fold) Line 524	C#
 	Microsoft.ML.Data.dll!Microsoft.ML.Runtime.Data.CrossValidationCommand.FoldHelper.GetCrossValidationTasks.AnonymousMethod__0() Line 494	C#
 	System.Private.CoreLib.dll!System.Threading.Tasks.Task<Microsoft.ML.Runtime.Data.CrossValidationCommand.FoldHelper.FoldResult>.InnerInvoke() Line 621	C#
 	System.Private.CoreLib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 145	C#
 	System.Private.CoreLib.dll!System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref System.Threading.Tasks.Task currentTaskSlot) Line 2454	C#
 	System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() Line 582	C#

It works perfectly fine with TLC 3.10 . I ran the same command using that version and it runs perfectly fine

Debugging

The array that is being getting created is of size 1 https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Transforms/Text/WordEmbeddingsTransform.cs#L431

array[0] = 300 //dimension of the word vector being downloaded which leads to failure later on.
There is no change in the file from the internal repo

cc @danmosemsft @sfilipi @eerhardt @shauheen

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions