Skip to content

Conversation

codesoda
Copy link
Contributor

@codesoda codesoda commented Jun 30, 2025

🚀 Summary

Syncs async-openai realtime types with the latest OpenAI Realtime API (June 2025).
Adds richer request/response configs, new client & server events, extra enums for models / voices / modalities, plus tracing & noise-reduction support.


✨ What’s new

  • Client events

    • Added ResponseConfig, OutputAudioBufferClearEvent, ConversationItemRetrieveEvent.
    • ResponseCancelEvent gains response_id.
    • ResponseCreateEvent now uses ResponseConfig instead of SessionResource.
  • Server events

    • Added output_audio_buffer.cleared, conversation.item.input_audio_transcription.delta, conversation.item.retrieved.
    • Fixed typo: InputAudioBufferCommitedEventInputAudioBufferCommittedEvent.
  • Response resource

    • New fields: finish_reason, created_at.
    • New finish reasons: TokenLimit, FunctionCall.
  • Session resource

    • New enums: RealtimeModel, Modality, NoiseReductionType.
    • Added fields: speed, input_audio_noise_reduction, tracing.
    • model is now RealtimeModel; modalities is Vec<Modality>.
  • Turn detection

    • Introduced semantic_vad mode with create_response and interrupt_response flags.
  • Audio

    • Unified enum names (g711_ulaw, g711_alaw).
    • Added InputAudioNoiseReduction.
  • Tooling

    • Wired ToolChoice & ToolDefinition into ResponseConfig.

⚠️ Breaking changes

  • ResponseCreateEvent: response now expects ResponseConfig, not SessionResource.
  • Enum casing: g711-ulaw / g711-alawg711_ulaw / g711_alaw.
  • Event rename: InputAudioBufferCommitedEventInputAudioBufferCommittedEvent.
  • Typed model field: SessionResource.model is now RealtimeModel (no longer a free-form String).

codesoda and others added 9 commits June 23, 2025 16:28
- Added `Cancelled` variant to `ResponseStatusDetail` enum for better handling of cancelled responses.
- Introduced `LogProb` struct to capture log probability information for transcribed tokens.
- Updated `ConversationItemInputAudioTranscriptionCompletedEvent` and `ConversationItemInputAudioTranscriptionDeltaEvent` to include optional `logprobs` for per-token log probability data.
- Enhanced `AudioTranscription` struct with optional fields for `language`, `model`, and `prompt` to improve transcription accuracy and customization.
- Added new `SemanticVAD` option in the `TurnDetection` enum to control model response eagerness.
- Expanded `RealtimeVoice` enum with additional voice options for more variety in audio responses.
- Changed enum variants for `AudioFormat` to use underscores instead of hyphens in their serialized names.
- Updated `G711ULAW` from `g711-ulaw` to `g711_law` and `G711ALAW` from `g711-alaw` to `g711_alaw` for improved clarity and adherence to naming conventions.
… optional

- Renamed `code` to `error_type` and made both `error_type` and `code` optional in the `FailedError` struct.
- Updated the serialization to use `type` for `error_type` to avoid conflicts with Rust keywords.
…mittedEvent

Updated the `InputAudioBufferCommittedEvent` struct to change the type of `previous_item_id` from `String` to `Option<String>`. This modification allows for greater flexibility in handling events where a preceding item may not exist.
This commit introduces a new variant, InputImage, to the ItemContentType enum in the realtime item module. This enhancement allows for the representation of image inputs, expanding the capabilities of the system to handle additional content types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant