Streaming TTS via MediaSource
patterns platform audio streaming elevenlabs
Proxy ElevenLabs’ streaming TTS through the backend as a raw ReadableStream, play in the browser via MediaSource + SourceBuffer so audio starts at the first chunk instead of after the full clip renders.
Content
Backend (POST /avatar/tts-stream in PrePitch, CON-113):
- Upstream:
https://api.elevenlabs.io/v1/text-to-speech/:voice/stream?output_format=mp3_44100_128 - Model:
eleven_turbo_v2_5(≈ half the synthesis latency ofeleven_turbo_v2). - Return
new Response(upstream.body, { headers })directly. Do notawait upstream.arrayBuffer()or similar — Bun/Hono won’t buffer when the handler returns a rawReadableStream. - 503 when
ELEVENLABS_API_KEYunset; 502 on non-ok upstream or missing body.
Frontend (triggerVoiceTTS rewrite):
- Feature-detect
MediaSource.isTypeSupported('audio/mpeg'); fall back to a preservedtriggerVoiceTTS_legacybody from every failure branch (unsupported type,addSourceBufferthrow, fetch reject, non-ok res, setup exception). - Attach
mediaSourceto an<audio>,addSourceBuffer('audio/mpeg'). - Chunk queue:
pending[], drain onupdateend.SourceBuffer.appendBufferthrows whileupdating===true, so it is NOT re-entrant — always queue and drain. - Start
audio.play()on first chunk so playback overlaps with download. finalize()polls until SourceBuffer is idle andpending[]empty, thenendOfStream().- Set
voiceAudioEl = audiobeforeplay()sointerruptTTS()can still pause + null it. - Wire
audio.onendedto restart speech recognition per the existing voice loop.
Legacy fallback (/avatar/tts + /avatar/audio/:id) stays untouched — the streaming path must degrade cleanly on Safari iOS etc.
Gotchas
triggerVoiceTTSfetch is NOT abortable by default.interruptTTS()pauses the audio element but theres.body.getReader()loop keeps pulling bytes until ElevenLabs closes the stream → bandwidth waste + timer leak throughdrain/finalize. Tie anAbortControllerto the fetch and abort frominterruptTTS.finalize()recursivesetTimeout(50ms)has no iteration cap. A SourceBuffer cannot actually stayupdatingforever, but cap at ~10 retries before callingendOfStream()anyway to harden against browser quirks.- Autoplay-blocked path must flip voice state back to listening and restart SR, or the session silently deadlocks.
Related
Source: raw/eval-2026-04-18-prepitch-latency.md | Ingested: 2026-04-18