Scale up/down GPU instances based on n_jobs #90

lefnire · 2020-10-24T21:49:21Z

Currently only spins up 1 AWS Batch instance when users are active (spins down after inactivity). Need to scale this.

Submit additional Batch jobs based on n_new_jobs vs n_machines in ("pending", "on"). Each instance can handle 2 jobs at a time (will improve after Reconsider current BERT models #28), so something like if n_new_jobs/n_machines/2 > 1: cloud_up().
Inverse check to spin down instances. Move from instance handling its own death, to submitting a kill job from server_jobs (since only one instance will take that job, and server_jobs will re-consider submitting another kill-job after one goes down).
(small) if 0 in queue, it says "eta 0 seconds" - should be total+30
Add dict (map) of number of concurrency specific jobs can handle. Eg, summarization/sentiment can handle 2-3 jobs at once; question-answering only one (can't run concurrent with anything else). Also add config.yml option for altering this per machine, in case some machines have higher/lower GPU compute.

The text was updated successfully, but these errors were encountered:

lefnire added help wanted Extra attention is needed 🛠Stability Anything stability-related, usually around server/GPU setup 🤖AI All the ML issues (NLP, XGB, etc) labels Oct 24, 2020

lefnire mentioned this issue Oct 24, 2020

GPU concurrency / scaling / stability #10

Closed

13 tasks

lefnire mentioned this issue Dec 8, 2020

Use cortext.dev for GPU scaling #128

Closed

lefnire moved this to Beta in Gnothi Nov 6, 2022

lefnire added this to Gnothi Nov 6, 2022

lefnire closed this as completed May 29, 2023

github-project-automation bot moved this from V1.5 to Done in Gnothi May 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scale up/down GPU instances based on n_jobs #90

Scale up/down GPU instances based on n_jobs #90

lefnire commented Oct 24, 2020 •

edited

Loading

Scale up/down GPU instances based on n_jobs #90

Scale up/down GPU instances based on n_jobs #90

Comments

lefnire commented Oct 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

lefnire commented Oct 24, 2020 •

edited

Loading