Skip to content

TextCatalog.ApplyWordEmbedding to KMeans Trainer generates IndexOutOfRangeException #4397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MaxAkbar opened this issue Oct 28, 2019 · 2 comments

Comments

@MaxAkbar
Copy link

System information

  • OS version/distro: Windows 10 PRO 10.0.18362
  • .NET Version (eg., dotnet --info): 3.1.100-preview1-014459

Issue

I am trying to cluster a group of documents. For this sample, I used news articles short descriptions. If I run this sample with FeaturizeText the sample builds a model. If I try to apply TextCatalog.ApplyWordEmbedding I get a System.IndexOutOfRangeException.

  • What did you do? Applying Wordembedding to KMeans Trainer
  • What happened? IndexOutOfRangeException
  • What did you expect? For the ML.NET to build my model

Source code / logs

Sample code to reproduce the problem can be found here.

StackTrace:
System.AggregateException: One or more errors occurred. (Index was outside the bounds of the array.) (Index was outside the bounds of the array.) (Index was outside the bounds of the array.)
---> System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Microsoft.ML.Trainers.KMeansBarBarInitialization.<>c__DisplayClass3_1.b__2(VBuffer`1& point, Int32 pointRowIndex, Single[] weights, Random rand)
at Microsoft.ML.Trainers.KMeansUtils.<>c__DisplayClass8_1`2.b__0()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.WaitAllCore(Task[] tasks, Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Task.WaitAll(Task[] tasks)
at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw(Exception source)
at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException)
at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions)
at Microsoft.ML.Trainers.KMeansUtils.ParallelMapReduce[TPartitionState,TGlobalState](Int32 numThreads, IHost baseHost, Factory factory, RowIndexGetter rowIndexGetter, InitAction1 initChunk, MapAction1 mapper, ReduceAction`2 reducer, TPartitionState[]& buffer, TGlobalState& result)
at Microsoft.ML.Trainers.KMeansBarBarInitialization.Initialize(IHost host, Int32 numThreads, IChannel ch, Factory cursorFactory, Int32 k, Int32 dimensionality, VBuffer`1[] centroids, Int64 accelMemBudgetMb, Int64& missingFeatureCount, Int64& totalTrainingInstances)
at Microsoft.ML.Trainers.KMeansTrainer.TrainCore(IChannel ch, RoleMappedData data, Int32 dimensionality)
at Microsoft.ML.Trainers.KMeansTrainer.TrainModelCore(TrainContext context)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.Fit(IDataView input)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
at ClusteringNewsArticles.Train.Program.Main(String[] args) in C:\Users\maxim\Source\Repos\machinelearning-samples\samples\csharp\getting-started\Clustering_NewsArticles\ClusteringNewsArticles.Train\Program.cs:line 54
---> (Inner Exception #1) System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Microsoft.ML.Trainers.KMeansBarBarInitialization.<>c__DisplayClass3_1.b__2(VBuffer`1& point, Int32 pointRowIndex, Single[] weights, Random rand)
at Microsoft.ML.Trainers.KMeansUtils.<>c__DisplayClass8_1`2.b__0()
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)<---

---> (Inner Exception #2) System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Microsoft.ML.Trainers.KMeansBarBarInitialization.<>c__DisplayClass3_1.b__2(VBuffer1& point, Int32 pointRowIndex, Single[] weights, Random rand) at Microsoft.ML.Trainers.KMeansUtils.<>c__DisplayClass8_12.b__0()
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)<--- |

@MaxAkbar
Copy link
Author

So looks like the data wasn't clean. The text had line breaks and some of the data had Russain headlines :(. I will clean the data and try it again.

@MaxAkbar
Copy link
Author

Closing the issue the data wasn't clean.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant