Zeta — TTS Playground

Sinhala text-to-speech. OpenAI-compatible POST /v1/audio/speech. First call after idle cold-starts (~40–60 s).

Endpoint base URL

Health: —

Playback speed 1.00× pitch-preserved · applies to replay/buffered players

Voice zero-shot

This model is a continuation-cloning TTS. Zero-shot, it improvises a voice per request. Pin a reference to get the same voice every time.

Pin voice (clone from reference)

Reference source

Reference transcript (ref_text — must match the reference audio)

Single synthesis

Text (Sinhala)

num_steps

stream & play as it arrives

Concurrency spike test

Fires N distinct texts in parallel → continuous batching. If batching works, total wall ≪ N × single-latency. With the voice pinned, every clip should also share one voice.

concurrency (1–12)

No build step. Keep ref.js next to this file for the built-in anchor voice. Audio is 48 kHz mono.