-
Notifications
You must be signed in to change notification settings - Fork 633
[LLM] Add code llama example #3050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks @Michaelvll!! Left some comments. Tried out the sky launch
bit, works nicely. Trying out with sky serve now.
``` | ||
As shown, the service is now backed by 2 replicas, one on Azure and one on GCP, and the accelerator | ||
type is chosen to be **the cheapest and available one** on the clouds. That said, it maximizes the | ||
availability of the service while minimizing the cost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean by 'maximize availability' here? Does it mean finding the instance faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More like it increases the chance of getting the resources with all the candidate resources in multiple resource pools.
Co-authored-by: Romil Bhardwaj <[email protected]>
Co-authored-by: Romil Bhardwaj <[email protected]>
Co-authored-by: Ziming Mao <[email protected]>
Co-authored-by: Ziming Mao <[email protected]>
Co-authored-by: Ziming Mao <[email protected]>
Co-authored-by: Ziming Mao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @Michaelvll! This is very cool!
llm/codellama/README.md
Outdated
* No one else sees your chat history | ||
|
||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put this GIF at the top, say at L2 to catch eyes?
@@ -38,7 +38,6 @@ run: | | |||
|
|||
python3 -m fastchat.serve.controller --host 0.0.0.0 --port ${CONTROLLER_PORT} > ~/controller.log 2>&1 & | |||
|
|||
cd FastChat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming this is intended : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it seems we forgot to remove this line before, when we change the dependency.
Co-authored-by: Romil Bhardwaj <[email protected]>
Added the example for tabby. PTAL @romilbhardwaj @MaoZiming : ) |
After #3048 is fixed, we should be able to make the model path a env var.
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh