Skip to content

Conversation

mattf
Copy link
Collaborator

@mattf mattf commented Aug 10, 2025

Add complete batches API implementation with protocol, providers, and tests:

Core Infrastructure:

  • Add batches API protocol using OpenAI Batch types directly
  • Add Api.batches enum value and protocol mapping in resolver
  • Add OpenAI "batch" file purpose support
  • Include proper error handling (ConflictError, ResourceNotFoundError)

Reference Provider:

  • Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list)
  • Implement background batch processing with configurable concurrency
  • Add SQLite KVStore backend for persistence
  • Support /v1/chat/completions endpoint with request validation

Comprehensive Test Suite:

  • Add unit tests for provider implementation with validation
  • Add integration tests for end-to-end batch processing workflows
  • Add error handling tests for validation, malformed inputs, and edge cases

Configuration:

  • Add max_concurrent_batches and max_concurrent_requests_per_batch options
  • Add provider documentation with sample configurations

Test with -

$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK

addresses #3066

Add complete batches API implementation with protocol, providers, and tests:

Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)

Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation

Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge cases

Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch options
- Add provider documentation with sample configurations

Test with -

```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 10, 2025
@mattf
Copy link
Collaborator Author

mattf commented Aug 10, 2025

@franciscojavierarceo ptal

@ashwinb
Copy link
Contributor

ashwinb commented Aug 12, 2025

lgtm

I am not a fan of the "batches" name for an API but hey this time I can just ignore that piece and point to someone else :)

Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we note in the documentation somewhere that this API is under development?

this also captures other notes from agents, eval and inference apis
@mattf
Copy link
Collaborator Author

mattf commented Aug 13, 2025

Should we note in the documentation somewhere that this API is under development?

i updated provider_codegen.py to include the api protocol's docstring in the providers docs and added a note about the development status.

fyi, this also picked up additional docs from agents, inference & eval.

@mattf
Copy link
Collaborator Author

mattf commented Aug 14, 2025

@ashwinb thoughts on the replay tests failing?

batch_id = f"batch_{uuid.uuid4().hex[:16]}"
current_time = int(time.time())

batch = BatchObject(
Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be useful to generate an idempotency key for retries and avoiding processing the same file more than once.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ idempotency is hugely valuable for apis like this.

since files are immutable, the params can serve as that id.

let me add an issue for that feature.

Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since files are immutable, the params can serve as that id.

💯

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@mattf mattf merged commit de69216 into llamastack:main Aug 14, 2025
32 checks passed
ashwinb added a commit that referenced this pull request Aug 14, 2025
ashwinb added a commit that referenced this pull request Aug 14, 2025
cdoern pushed a commit to cdoern/llama-stack that referenced this pull request Aug 14, 2025
Add complete batches API implementation with protocol, providers, and
tests:

Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)

Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation

Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases

Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations

Test with -

```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```

addresses llamastack#3066
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants