enhancement: Add Structured Output support #127

Azzedde · 2025-05-12T15:46:50Z

Summary

This PR adds structured output support using Pydantic models, allowing users to define output schemas that LLM responses will be parsed into. Closes #121.

Changes

Added get_parsed_completion() function in skllm/llm/gpt/clients/openai/completion.py that:
- Takes a Pydantic model class as input
- Configures JSON response format automatically
- Returns parsed model instances
Added comprehensive tests in tests/test_structured_outputs.py covering:
- Successful parsing into models
- Field validation
- Error handling
Updated documentation strings and type hints

Motivation

Structured outputs provide more reliable results by:

Constraining LLM outputs to specific schemas
Enabling built-in validation through Pydantic
Improving developer experience with typed responses

This directly addresses #121 by implementing the requested structured output functionality.

How to test

Define a Pydantic model for your desired output
Call get_parsed_completion() with your model class
Verify the response is properly parsed and validated

Example test cases are provided in the test file.

Risks & considerations

Currently only implemented for OpenAI - could be extended to other providers
Large/complex schemas may impact performance
Default error messages could be more user-friendly

Additional information

The implementation maintains backward compatibility while adding the new functionality. The parsing happens after the standard completion response, so existing code won't be affected.

OKUA1 · 2025-05-18T19:45:37Z

Hi @Azzedde,

Thank you for the PR. I have a couple of questions. First of all, it seems like the structured completion is only implemented at the client level and not exposed to any of the higher level APIs. Is this intentional? Also, currently the project uses unittest, not pytest, so it is better to stick to it.

Azzedde · 2025-05-19T09:34:13Z

Hi @OKUA1 ,

Thank you for your feedback. Regarding the structured completion implementation:

The functionality is intentionally implemented at the client level (OpenAI-specific) while being properly abstracted in the base interface (BaseTextCompletionMixin._get_parsed_completion). This allows for:
- Provider-specific implementations (currently only OpenAI supports this natively)
- Consistent interface across providers
- Flexibility for future provider integrations
I've reverted the tests back to unittest format - apologies for initially using pytest. The test cases now properly validate both the base interface and OpenAI-specific implementation.

enhancement: Add Structured Output support

0d04ee7

Azzedde mentioned this pull request May 12, 2025

Add Structured Output support #121

Open

enhance: turning pytest tests to unittest

952cc2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enhancement: Add Structured Output support #127

enhancement: Add Structured Output support #127

Uh oh!

Azzedde commented May 12, 2025

Uh oh!

OKUA1 commented May 18, 2025

Uh oh!

Azzedde commented May 19, 2025

Uh oh!

Uh oh!

enhancement: Add Structured Output support #127

Are you sure you want to change the base?

enhancement: Add Structured Output support #127

Uh oh!

Conversation

Azzedde commented May 12, 2025

Summary

Changes

Motivation

How to test

Risks & considerations

Additional information

Uh oh!

OKUA1 commented May 18, 2025

Uh oh!

Azzedde commented May 19, 2025

Uh oh!

Uh oh!