Scale up/down GPU instances based on n_jobs #90
Labels
🤖AI
All the ML issues (NLP, XGB, etc)
help wanted
Extra attention is needed
🛠Stability
Anything stability-related, usually around server/GPU setup
Uh oh!
There was an error while loading. Please reload this page.
Currently only spins up 1 AWS Batch instance when users are active (spins down after inactivity). Need to scale this.
n_new_jobs
vsn_machines in ("pending", "on")
. Each instance can handle 2 jobs at a time (will improve after Reconsider current BERT models #28), so something likeif n_new_jobs/n_machines/2 > 1: cloud_up()
.kill
job from server_jobs (since only one instance will take that job, and server_jobs will re-consider submitting another kill-job after one goes down).The text was updated successfully, but these errors were encountered: