Skip to content

Use cortext.dev for GPU scaling #128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lefnire opened this issue Dec 8, 2020 · 1 comment
Closed

Use cortext.dev for GPU scaling #128

lefnire opened this issue Dec 8, 2020 · 1 comment
Labels
🤖AI All the ML issues (NLP, XGB, etc) help wanted Extra attention is needed 🛠Stability Anything stability-related, usually around server/GPU setup

Comments

@lefnire
Copy link
Collaborator

lefnire commented Dec 8, 2020

Cortex has been popping up for me a lot lately. It's an open-source infra tool for managing GPU-scaling within your own cloud, which is phenomenal. I discounted it early on because I thought it was its own hosting solution, and I need to host within AWS for EFS access (one reason I switched off Paperspace). Before I use Cortex, I need support for no-activity = 0 gpus (auto-scale to 0) cortexlabs/cortex#445, which I'm handling manually. This ticket would replace #90 #10.

See tutorial for transformers. Also their transformers perf improvements via #62

@lefnire lefnire added help wanted Extra attention is needed 🛠Stability Anything stability-related, usually around server/GPU setup 🤖AI All the ML issues (NLP, XGB, etc) labels Dec 8, 2020
@lefnire
Copy link
Collaborator Author

lefnire commented Dec 12, 2020

Follow up from that ticket:

We haven't decided yet on our priority for implementing this feature. One thing that can render it less useful (or at least "awkward") is how long it takes to spin up a GPU instance and install the dependencies on it; we'd have to hold on to the request for 5+ minutes before forwarding it along. A more intuitive approach might be to support an asynchronous API instead, where you make the API request and it responds immediately with an execution ID, and then you can make an additional request to another API to query the status/results for the execution ID (we have #1610 to track this).

In the meantime, in case it's helpful, it is possible to create/delete APIs programmatically via the Cortex CLI or Python client. So if you know you are expecting traffic, or it happens on a regular schedule, you could create/delete APIs accordingly.

Also, we do currently support batch jobs, which is a bit like the asynchronous approach I described, except that autoscaling behaves differently: for batch jobs, you submit a job and indicate how many containers you want to run it on, and then once the job is done, the containers spin down. So it does "scale to 0", but is not designed to handle real-time traffic where each individual request is fairly lightweight, and can come at any time from any source.

@lefnire lefnire moved this to Beta in Gnothi Nov 6, 2022
@lefnire lefnire added this to Gnothi Nov 6, 2022
@lefnire lefnire closed this as completed May 29, 2023
@github-project-automation github-project-automation bot moved this from V1.5 to Done in Gnothi May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖AI All the ML issues (NLP, XGB, etc) help wanted Extra attention is needed 🛠Stability Anything stability-related, usually around server/GPU setup
Projects
Archived in project
Development

No branches or pull requests

1 participant