Skip to content

ML.NET CLI performance with more cores #645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bsoulier opened this issue Mar 31, 2020 · 3 comments
Closed

ML.NET CLI performance with more cores #645

bsoulier opened this issue Mar 31, 2020 · 3 comments
Assignees
Milestone

Comments

@bsoulier
Copy link

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
Would the ML.NET CLI benefit from having more cores at its disposal when training on big amount of data (meaning quicker time to compute a model)?

Describe alternatives you've considered
N/A

Additional context
N/A

@LittleLittleCloud LittleLittleCloud added this to the May 2020 milestone Apr 1, 2020
@LittleLittleCloud LittleLittleCloud self-assigned this Apr 1, 2020
@LittleLittleCloud
Copy link
Contributor

@justinormont Will we get bonus training using AutoML.Net when we have more cores

@justinormont
Copy link

Yes ML․NET is multi-threaded, though its use in AutoML should be improved.

See dotnet/machinelearning#4092:

The CPU utilization will depend on the trainer running.

Few optimizations we should make:

Apologies for the links to a non-public repo. We still have to transfer the issues to the current repo.

The NumberOfThreads task would be quite easy to implement (if it's even needed anymore). Part of that would be to check if ML․NET is currently defaulting a reasonable number of threads for the transforms, trainers, and the pipeline. Some background on threading in ML․NET: dotnet/machinelearning#136, dotnet/machinelearning#135, dotnet/machinelearning#277). I'm not sure which solution it landed on before releasing v1.0.

In addition to the above, for NUMA architectures, we may want to enable: (info)

<configuration>
   <runtime>
      <gcServer enabled="true"/>
      <Thread_UseAllCpuGroups  enabled="true"/>
   </runtime>
</configuration>

I'm unsure if we can set Thread_UseAllCpuGroups & gcServer for the AutoML API due to being a library, but we can for the CLI & Model Builder. This should also be tested, as training a single model within a single NUMA group may be faster than incurring the cross group latency. After implementing parallel sweeping, the optimal solution might be training one model per NUMA group.

@LittleLittleCloud
Copy link
Contributor

I'm going to close this issue for clear up, feel free to re-open it whenever you have questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants