Skip to content

Conversation

cdoern
Copy link
Contributor

@cdoern cdoern commented Aug 14, 2025

What does this PR do?

Add complete batches API implementation with protocol, providers, and tests:

Core Infrastructure:

  • Add batches API protocol using OpenAI Batch types directly
  • Add Api.batches enum value and protocol mapping in resolver
  • Add OpenAI "batch" file purpose support
  • Include proper error handling (ConflictError, ResourceNotFoundError)

Reference Provider:

  • Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list)
  • Implement background batch processing with configurable concurrency
  • Add SQLite KVStore backend for persistence
  • Support /v1/chat/completions endpoint with request validation

Comprehensive Test Suite:

  • Add unit tests for provider implementation with validation
  • Add integration tests for end-to-end batch processing workflows
  • Add error handling tests for validation, malformed inputs, and edge cases

Configuration:

  • Add max_concurrent_batches and max_concurrent_requests_per_batch options
  • Add provider documentation with sample configurations

Test Plan

Test with -

$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK

addresses #3066

Add complete batches API implementation with protocol, providers, and
tests:

Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)

Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation

Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases

Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations

Test with -

```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```

addresses llamastack#3066
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 14, 2025
@cdoern
Copy link
Contributor Author

cdoern commented Aug 14, 2025

opening this to see if this re-records

@ashwinb ashwinb added the re-record-tests Spin up ollama, inference and record responses for later use label Aug 14, 2025
@cdoern
Copy link
Contributor Author

cdoern commented Aug 14, 2025

@mattf FYI these are the same failures I see locally when recording or replaying

@ashwinb
Copy link
Contributor

ashwinb commented Aug 14, 2025

opening this to see if this re-records

I added the label now, lets see

@ashwinb
Copy link
Contributor

ashwinb commented Aug 14, 2025

Failed. My workflow sucks. Let me look into it in a bit.

@cdoern
Copy link
Contributor Author

cdoern commented Aug 14, 2025

ah the checkout mechanism is wrong. @ashwinb I think just this #3154 will do it, right?

@cdoern
Copy link
Contributor Author

cdoern commented Aug 14, 2025

which is just the basic checkout mechanism without a ref

@ashwinb
Copy link
Contributor

ashwinb commented Aug 14, 2025

@cdoern the important issue is that you want to be able to push back to the ref. That needs you to specify the ref. But this can only work in the parent repo (so someone who has write access), but not in a fork. I am researching more but appears all of this is possible only for repo maintainers right now.

@ashwinb
Copy link
Contributor

ashwinb commented Aug 14, 2025

OK I learnt a few things. This thing is only possible if:

  • we create a special app within the llamastack app for this purpose
  • this workflow is provided the access token of this app
  • the app is installed by the user in the fork under which the PR was generated

for now, maybe we just need to do this from the maintainers, or have a simple script which anyone can run locally themselves. (scripts/integration-test.sh is almost there.)

@cdoern
Copy link
Contributor Author

cdoern commented Aug 14, 2025

I can run the recordings locally, the error is the same as the replay ones here.

@cdoern
Copy link
Contributor Author

cdoern commented Aug 14, 2025

will keep digging

@ashwinb
Copy link
Contributor

ashwinb commented Aug 15, 2025

Closing this, will take over Matt's PR and commit appropriately.

@ashwinb ashwinb closed this Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. re-record-tests Spin up ollama, inference and record responses for later use
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants