Skip to content

enhancement: Add Structured Output support #127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Azzedde
Copy link

@Azzedde Azzedde commented May 12, 2025

Summary

This PR adds structured output support using Pydantic models, allowing users to define output schemas that LLM responses will be parsed into. Closes #121.

Changes

  • Added get_parsed_completion() function in skllm/llm/gpt/clients/openai/completion.py that:
    • Takes a Pydantic model class as input
    • Configures JSON response format automatically
    • Returns parsed model instances
  • Added comprehensive tests in tests/test_structured_outputs.py covering:
    • Successful parsing into models
    • Field validation
    • Error handling
  • Updated documentation strings and type hints

Motivation

Structured outputs provide more reliable results by:

  • Constraining LLM outputs to specific schemas
  • Enabling built-in validation through Pydantic
  • Improving developer experience with typed responses

This directly addresses #121 by implementing the requested structured output functionality.

How to test

  1. Define a Pydantic model for your desired output
  2. Call get_parsed_completion() with your model class
  3. Verify the response is properly parsed and validated

Example test cases are provided in the test file.

Risks & considerations

  • Currently only implemented for OpenAI - could be extended to other providers
  • Large/complex schemas may impact performance
  • Default error messages could be more user-friendly

Additional information

The implementation maintains backward compatibility while adding the new functionality. The parsing happens after the standard completion response, so existing code won't be affected.

@OKUA1
Copy link
Collaborator

OKUA1 commented May 18, 2025

Hi @Azzedde,

Thank you for the PR. I have a couple of questions. First of all, it seems like the structured completion is only implemented at the client level and not exposed to any of the higher level APIs. Is this intentional? Also, currently the project uses unittest, not pytest, so it is better to stick to it.

@Azzedde
Copy link
Author

Azzedde commented May 19, 2025

Hi @OKUA1 ,

Thank you for your feedback. Regarding the structured completion implementation:

  1. The functionality is intentionally implemented at the client level (OpenAI-specific) while being properly abstracted in the base interface (BaseTextCompletionMixin._get_parsed_completion). This allows for:

    • Provider-specific implementations (currently only OpenAI supports this natively)
    • Consistent interface across providers
    • Flexibility for future provider integrations
  2. I've reverted the tests back to unittest format - apologies for initially using pytest. The test cases now properly validate both the base interface and OpenAI-specific implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Structured Output support
2 participants