feat: Add inputAudioTranscription support to Java ADK #463

jinnigu · 2025-09-28T06:22:16Z

Summary

Adds inputAudioTranscription support to the Java ADK to achieve feature parity with Python. When enabled, the live connect config requests model-side transcription of input audio into text, allowing real-time processing of spoken input in live streaming scenarios.

Changes

Core Implementation

RunConfig: Added inputAudioTranscription field with getter/setter and builder support
Basic: Maps RunConfig.inputAudioTranscription to LiveConnectConfig.inputTranscription for model-side transcription
Runner: Auto-enables input/output transcription for live multi-agent scenarios to match Python behavior

Bug Fix

Runner: Fixed unreachable condition in newInvocationContextForLive() where the outer check !CollectionUtils.isNullOrEmpty(runConfig.responseModalities()) (NOT empty) made the inner check CollectionUtils.isNullOrEmpty(runConfig.responseModalities()) (IS empty) impossible to reach. This prevented the "default to AUDIO modality" logic from ever executing.

Behavior Alignment with Python

Auto-sets inputAudioTranscription only for live multi-agent runs (when agent.subAgents() is non-empty)
Auto-sets outputAudioTranscription when response modalities imply audio usage
Leaves transcription settings unchanged for single-agent scenarios

Testing

Added unit tests for RunConfig transcription field handling
Added unit tests for Basic flow mapping to LiveConnectConfig

vorburger

@jinnigu Thank You for contributing this! Is there any way to illustrate that this actually really 😆 fully works, in this PR? The unit tests are... well, unit tests. Would a full-blown integration test for this be possible? Or, how would you feel about if I invite you to add, as part of this PR, a very (most) simple "MVP" in tutorials/audio with just a super simple LlmAgent (without even any sub-agents), with just an AdkWebServer.start(), which allows us to "see this work in action"? That would be awesome!

vorburger · 2025-10-07T09:48:47Z

core/src/main/java/com/google/adk/runner/Runner.java

-    if (!CollectionUtils.isNullOrEmpty(runConfig.responseModalities())
-        && liveRequestQueue.isPresent()) {
+    if (liveRequestQueue.isPresent() && !this.agent.subAgents().isEmpty()) {
+      // Parity with Python: apply modality defaults and transcription settings
+      // only for multi-agent live scenarios.


@jinnigu The inline comment and the code don't seem to align, here? The "text" says "apply modality defaults" but then this removes !CollectionUtils.isNullOrEmpty(runConfig.responseModalities()... is that intentional? (It may well be, I'm entirely sure about why this was originally like this; but it seems worth double checking.) Also, why would we limit transcription only for multi-agent live scenarios? I would personally love to use this even for a very simple trivial only-LlmAgent use case... you speak to it, and get a persistent transcript in your session store, that's very cool! I'd love to use this e.g. in my (personal) https://docs.enola.dev project - but don't see why it needs to be limited to work only if !this.agent.subAgents().isEmpty().

vorburger · 2025-10-07T09:56:06Z

/gemini review

gemini-code-assist

Code Review

This pull request successfully adds inputAudioTranscription support to the Java ADK, achieving feature parity with the Python version. The changes are well-structured, including updates to RunConfig, the Basic flow, and the Runner. A significant improvement is the fix for an unreachable code block in Runner.java, which enhances correctness. The new functionality is also thoroughly covered by unit tests. I have one suggestion to refactor a small portion of the logic in Runner.java to improve maintainability by removing duplicated code. Overall, this is a solid and valuable contribution.

jinnigu force-pushed the feature/inputAudioTranscription branch from f2d2406 to 2ff85c5 Compare September 28, 2025 06:44

jinnigu marked this pull request as ready for review September 28, 2025 07:48

jinnigu mentioned this pull request Sep 28, 2025

Need support inputAudioTranscription in core/src/main/java/com/google/adk/agents/RunConfig.java #281

Open

jinnigu force-pushed the feature/inputAudioTranscription branch 4 times, most recently from e19b579 to 2eb051c Compare September 28, 2025 17:12

jinnigu force-pushed the feature/inputAudioTranscription branch from 9418e8b to e884108 Compare October 7, 2025 07:04

feat: Add inputAudioTranscription support to Java ADK

c1da61c

jinnigu force-pushed the feature/inputAudioTranscription branch from e884108 to c1da61c Compare October 7, 2025 07:07

vorburger requested changes Oct 7, 2025

View reviewed changes

gemini-code-assist bot reviewed Oct 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add inputAudioTranscription support to Java ADK #463

feat: Add inputAudioTranscription support to Java ADK #463

jinnigu commented Sep 28, 2025 •

edited

Loading

Uh oh!

vorburger left a comment

Uh oh!

vorburger Oct 7, 2025

Uh oh!

vorburger commented Oct 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

feat: Add inputAudioTranscription support to Java ADK #463

Are you sure you want to change the base?

feat: Add inputAudioTranscription support to Java ADK #463

Conversation

jinnigu commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core Implementation

Bug Fix

Behavior Alignment with Python

Testing

Uh oh!

vorburger left a comment

Choose a reason for hiding this comment

Uh oh!

vorburger Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

vorburger commented Oct 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

jinnigu commented Sep 28, 2025 •

edited

Loading