elevenlabs · AsyncAPI Specification
ElevenLabs Text to Speech Streaming Events
Version 1.0
The ElevenLabs Text to Speech WebSocket API enables bidirectional streaming for text-to-speech conversion. Clients send text chunks incrementally and receive audio chunks as they are generated, enabling ultra-low latency speech synthesis for real-time applications.
Channels
/stream-input
Receive generated audio chunks
Bidirectional WebSocket channel for streaming text-to-speech. Clients send text chunks and receive audio chunks in real time as the model generates speech.
Messages
AudioChunkEvent
Audio Chunk
Generated audio data chunk
AlignmentEvent
Alignment Data
Word-level timing alignment data
FinalEvent
Final Event
Signals the end of audio generation
InitMessage
Initialization Message
Initial configuration for the streaming session
TextChunkMessage
Text Chunk
Text input for speech synthesis
FlushMessage
Flush
Forces generation of remaining audio
CloseMessage
Close
Signals the end of text input
Servers
wss
production
wss://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream-input
ElevenLabs Text to Speech WebSocket server for bidirectional streaming synthesis.