Deepgram · AsyncAPI Specification
Deepgram Text-to-Speech Streaming Events
Version 1.0
The Deepgram Text-to-Speech streaming API provides real-time speech synthesis over a WebSocket connection. Text is sent as JSON messages and audio data is returned as binary WebSocket messages, enabling continuous streaming text-to-speech for conversational AI applications, voice agents, and real-time voice interfaces.
View Spec
View on GitHub
Artificial IntelligenceSpeech-To-TextText-To-SpeechTranscriptionVoice AIAsyncAPIWebhooksEvents
Channels
/v1/speak
Send text for speech synthesis
WebSocket channel for real-time text-to-speech streaming. The client sends text as JSON messages and receives synthesized audio as binary frames. Connection parameters include model, encoding, sample_rate, and container settings.
Messages
TextInput
Text Input
Text to synthesize into speech
Flush
Flush
Flush pending text
Reset
Reset
Reset the synthesis state
Close
Close
Close the streaming session
AudioData
Audio Data
Synthesized speech audio data
Flushed
Flushed
Flush confirmation
Warning
Warning
Warning message
TTSError
Error
Error message
Servers
wss
production
wss://api.deepgram.com/v1/speak
Deepgram production WebSocket server for real-time text-to-speech streaming. Connect with query parameters to configure the voice model, encoding, and sample rate.
wss
eu
wss://api.eu.deepgram.com/v1/speak
Deepgram EU WebSocket server for real-time text-to-speech streaming.