Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 23 additions & 36 deletions .mock/definition/tts/__package__.yml
Original file line number Diff line number Diff line change
Expand Up @@ -165,15 +165,9 @@ service:
- text: >-
Beauty is no quality in things themselves: It exists merely in
the mind which contemplates them.
description: >-
Middle-aged masculine voice with a clear, rhythmic Scots lilt,
rounded vowels, and a warm, steady tone with an articulate,
academic quality.
context:
generation_id: 09ad914d-8e7f-40f8-a279-e34f07f7dab2
format:
type: mp3
num_generations: 1
voice:
name: Male English Actor
provider: HUME_AI
synthesize-json-streaming:
path: /v0/tts/stream/json
method: POST
Expand Down Expand Up @@ -206,19 +200,9 @@ service:
- text: >-
Beauty is no quality in things themselves: It exists merely in
the mind which contemplates them.
description: >-
Middle-aged masculine voice with a clear, rhythmic Scots lilt,
rounded vowels, and a warm, steady tone with an articulate,
academic quality.
context:
utterances:
- text: How can people see beauty so differently?
description: >-
A curious student with a clear and respectful tone, seeking
clarification on Hume's ideas with a straightforward
question.
format:
type: mp3
voice:
name: Male English Actor
provider: HUME_AI
source:
openapi: tts-openapi.yml
types:
Expand Down Expand Up @@ -390,22 +374,19 @@ types:
see our documentation on [instant
mode](/docs/text-to-speech-tts/overview#ultra-low-latency-streaming-instant-mode).

- Dynamic voice generation is not supported with this mode; a
predefined
- A
[voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice)
must be specified in your request.
must be specified when instant mode is enabled. Dynamic voice
generation is not supported with this mode.

- This mode is only supported for streaming endpoints (e.g.,
- Instant mode is only supported for streaming endpoints (e.g.,
[/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming),
[/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).

- Ensure only a single generation is requested
([num_generations](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.num_generations)
must be `1` or omitted).

- With `instant_mode` enabled, **requests incur a 10% higher cost**
due to increased compute and resource requirements.
default: false
default: true
source:
openapi: tts-openapi.yml
ReturnTts:
Expand Down Expand Up @@ -514,14 +495,20 @@ types:
docs: >-
Natural language instructions describing how the synthesized speech
should sound, including but not limited to tone, intonation, pacing,
and accent (e.g., 'a soft, gentle voice with a strong British
accent').
and accent.

- If a Voice is specified in the request, this description serves as
acting instructions. For tips on how to effectively guide speech
delivery, see our guide on [Acting

**This field behaves differently depending on whether a voice is
specified**:

- **Voice specified**: the description will serve as acting directions
for delivery. Keep directions concise—100 characters or fewer—for best
results. See our guide on [acting
instructions](/docs/text-to-speech-tts/acting-instructions).
- If no Voice is specified, a new voice is generated based on this description. See our [prompting guide](/docs/text-to-speech-tts/prompting) for tips on designing a voice.

- **Voice not specified**: the description will serve as a voice
prompt for generating a voice. See our [prompting
guide](/docs/text-to-speech-tts/prompting) for design tips.
validation:
maxLength: 1000
speed:
Expand Down
529 changes: 280 additions & 249 deletions poetry.lock

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name = "hume"

[tool.poetry]
name = "hume"
version = "0.9.1"
version = "0.9.2"
description = "A Python SDK for Hume AI"
readme = "README.md"
authors = []
Expand Down
48 changes: 18 additions & 30 deletions reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,10 +145,9 @@ This setting affects how the `snippets` array is structured in the response, whi
**instant_mode:** `typing.Optional[bool]`

Enables ultra-low latency streaming, significantly reducing the time until the first audio chunk is received. Recommended for real-time applications requiring immediate audio playback. For further details, see our documentation on [instant mode](/docs/text-to-speech-tts/overview#ultra-low-latency-streaming-instant-mode).
- Dynamic voice generation is not supported with this mode; a predefined [voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice) must be specified in your request.
- This mode is only supported for streaming endpoints (e.g., [/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming), [/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).
- A [voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice) must be specified when instant mode is enabled. Dynamic voice generation is not supported with this mode.
- Instant mode is only supported for streaming endpoints (e.g., [/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming), [/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).
- Ensure only a single generation is requested ([num_generations](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.num_generations) must be `1` or omitted).
- With `instant_mode` enabled, **requests incur a 10% higher cost** due to increased compute and resource requirements.

</dd>
</dl>
Expand Down Expand Up @@ -294,10 +293,9 @@ This setting affects how the `snippets` array is structured in the response, whi
**instant_mode:** `typing.Optional[bool]`

Enables ultra-low latency streaming, significantly reducing the time until the first audio chunk is received. Recommended for real-time applications requiring immediate audio playback. For further details, see our documentation on [instant mode](/docs/text-to-speech-tts/overview#ultra-low-latency-streaming-instant-mode).
- Dynamic voice generation is not supported with this mode; a predefined [voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice) must be specified in your request.
- This mode is only supported for streaming endpoints (e.g., [/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming), [/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).
- A [voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice) must be specified when instant mode is enabled. Dynamic voice generation is not supported with this mode.
- Instant mode is only supported for streaming endpoints (e.g., [/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming), [/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).
- Ensure only a single generation is requested ([num_generations](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.num_generations) must be `1` or omitted).
- With `instant_mode` enabled, **requests incur a 10% higher cost** due to increased compute and resource requirements.

</dd>
</dl>
Expand Down Expand Up @@ -345,7 +343,7 @@ Streams synthesized speech using the specified voice. If no voice is provided, a

```python
from hume import HumeClient
from hume.tts import FormatMp3, PostedContextWithGenerationId, PostedUtterance
from hume.tts import PostedUtterance, PostedUtteranceVoiceWithName

client = HumeClient(
api_key="YOUR_API_KEY",
Expand All @@ -354,14 +352,12 @@ client.tts.synthesize_file_streaming(
utterances=[
PostedUtterance(
text="Beauty is no quality in things themselves: It exists merely in the mind which contemplates them.",
description="Middle-aged masculine voice with a clear, rhythmic Scots lilt, rounded vowels, and a warm, steady tone with an articulate, academic quality.",
voice=PostedUtteranceVoiceWithName(
name="Male English Actor",
provider="HUME_AI",
),
)
],
context=PostedContextWithGenerationId(
generation_id="09ad914d-8e7f-40f8-a279-e34f07f7dab2",
),
format=FormatMp3(),
num_generations=1,
)

```
Expand Down Expand Up @@ -441,10 +437,9 @@ This setting affects how the `snippets` array is structured in the response, whi
**instant_mode:** `typing.Optional[bool]`

Enables ultra-low latency streaming, significantly reducing the time until the first audio chunk is received. Recommended for real-time applications requiring immediate audio playback. For further details, see our documentation on [instant mode](/docs/text-to-speech-tts/overview#ultra-low-latency-streaming-instant-mode).
- Dynamic voice generation is not supported with this mode; a predefined [voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice) must be specified in your request.
- This mode is only supported for streaming endpoints (e.g., [/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming), [/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).
- A [voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice) must be specified when instant mode is enabled. Dynamic voice generation is not supported with this mode.
- Instant mode is only supported for streaming endpoints (e.g., [/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming), [/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).
- Ensure only a single generation is requested ([num_generations](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.num_generations) must be `1` or omitted).
- With `instant_mode` enabled, **requests incur a 10% higher cost** due to increased compute and resource requirements.

</dd>
</dl>
Expand Down Expand Up @@ -494,7 +489,7 @@ The response is a stream of JSON objects including audio encoded in base64.

```python
from hume import HumeClient
from hume.tts import FormatMp3, PostedContextWithUtterances, PostedUtterance
from hume.tts import PostedUtterance, PostedUtteranceVoiceWithName

client = HumeClient(
api_key="YOUR_API_KEY",
Expand All @@ -503,18 +498,12 @@ response = client.tts.synthesize_json_streaming(
utterances=[
PostedUtterance(
text="Beauty is no quality in things themselves: It exists merely in the mind which contemplates them.",
description="Middle-aged masculine voice with a clear, rhythmic Scots lilt, rounded vowels, and a warm, steady tone with an articulate, academic quality.",
voice=PostedUtteranceVoiceWithName(
name="Male English Actor",
provider="HUME_AI",
),
)
],
context=PostedContextWithUtterances(
utterances=[
PostedUtterance(
text="How can people see beauty so differently?",
description="A curious student with a clear and respectful tone, seeking clarification on Hume's ideas with a straightforward question.",
)
],
),
format=FormatMp3(),
)
for chunk in response.data:
yield chunk
Expand Down Expand Up @@ -596,10 +585,9 @@ This setting affects how the `snippets` array is structured in the response, whi
**instant_mode:** `typing.Optional[bool]`

Enables ultra-low latency streaming, significantly reducing the time until the first audio chunk is received. Recommended for real-time applications requiring immediate audio playback. For further details, see our documentation on [instant mode](/docs/text-to-speech-tts/overview#ultra-low-latency-streaming-instant-mode).
- Dynamic voice generation is not supported with this mode; a predefined [voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice) must be specified in your request.
- This mode is only supported for streaming endpoints (e.g., [/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming), [/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).
- A [voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice) must be specified when instant mode is enabled. Dynamic voice generation is not supported with this mode.
- Instant mode is only supported for streaming endpoints (e.g., [/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming), [/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).
- Ensure only a single generation is requested ([num_generations](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.num_generations) must be `1` or omitted).
- With `instant_mode` enabled, **requests incur a 10% higher cost** due to increased compute and resource requirements.

</dd>
</dl>
Expand Down
Loading