Skip to content

Simplify AI Chat Response Streaming #1167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 21, 2025
Merged

Conversation

debanjum
Copy link
Member

@debanjum debanjum commented Apr 21, 2025

Reason

  • Simplify code and logic to stream chat response by solely relying on asyncio event loop.
  • Reduce overhead of managing threads to increase efficiency and throughput (where possible).

Details

  • Use async/await with no threading when generating chat response via OpenAI, Gemini, Anthropic AI model APIs
  • Use threading for offline chat model as llama-cpp doesn't support async streaming yet

@debanjum debanjum added the upgrade New feature or request label Apr 21, 2025
@debanjum debanjum merged commit f929ff8 into master Apr 21, 2025
10 checks passed
@debanjum debanjum deleted the simplify-chat-response-streaming branch April 21, 2025 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upgrade New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant