Skip to content

Inconsistancy in Training With Multiple Threads #1144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ArieJones opened this issue Oct 4, 2018 · 3 comments
Closed

Inconsistancy in Training With Multiple Threads #1144

ArieJones opened this issue Oct 4, 2018 · 3 comments

Comments

@ArieJones
Copy link

We are currently using version 0.50 in creating some classifier models but are seeing some strange behavior. We are currently setting the number of threads in our classifiers due to #217 and wanting to be able to control the CPU usage on the server.
So when using a classifier like so ..
var algo = new StochasticDualCoordinateAscentClassifier() { Caching = CachingOptions.Disk, MaxIterations = 100, LossFunction = new SmoothedHingeLossSDCAClassificationLossFunction(), Shuffle = false, NumThreads = System.Environment.ProcessorCount - 1 //We use one less than the number of processors available, };

What we are noticing is that if we run this from a box with 4 cores on it then we get a decent model where the microaccuracy is above 90%. However, when we move this same code over to a larger server with 8 cores we are getting wildly different results. The microaccuracy drops down to around <60%.
Yikes!

Is there possibly something we are missing in the documentation that would address this?

@justinormont
Copy link
Contributor

How large is your dataset? The large swing in micro-accuracy could be randomness on top of a very small dataset.

Can you post your training code?

@ArieJones
Copy link
Author

Do you mean the code for the pipeline?

The datasets are around 15k entries and should be sufficiently sized.

Last night we ran tests constraining the number of CPUs from 7 - 1 … Anything under 4 comes out with the expected accuracy ….and anything greater goes to south. This was with running and rerunning with the same training data.

We are just trying to understand what the reason for this is, so that we can properly gauge things of scale such as how long it takes to train a model. When we are using 7 cores we get good throughput ~5 minutes training time but lousy results. When we use 4 we get slower ~30+ minutes training time but much better results.

@codemzs
Copy link
Member

codemzs commented Jun 30, 2019

This is expected with linear learners when used with multi-threading.

@codemzs codemzs closed this as completed Jun 30, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants