Deepgram · AsyncAPI Specification
Deepgram Speech-to-Text Streaming Events
Version 1.0
The Deepgram Speech-to-Text streaming API provides real-time transcription of audio using a WebSocket connection. Audio data is sent as binary WebSocket messages and transcription results are returned as JSON messages in real-time, supporting interim results, final results, speaker diarization, and speech detection events. The API supports the same model family and feature parameters as the pre-recorded API.
View Spec
View on GitHub
Artificial IntelligenceSpeech-To-TextText-To-SpeechTranscriptionVoice AIAsyncAPIWebhooksEvents
Channels
/v1/listen
Send audio data for real-time transcription
WebSocket channel for real-time speech-to-text streaming. The client sends binary audio frames and receives JSON transcription events. Connection parameters include model, language, punctuate, diarize, smart_format, interim_results, utterance_end_ms, vad_events, and encoding options.
Messages
AudioFrame
Audio Frame
Binary audio data frame
CloseStream
Close Stream
Signal to close the audio stream
KeepAlive
Keep Alive
Keep the connection alive
TranscriptResult
Transcript Result
Real-time transcription result
SpeechStarted
Speech Started
Speech activity detected
UtteranceEnd
Utterance End
End of utterance detected
StreamMetadata
Stream Metadata
Stream metadata information
StreamError
Stream Error
Stream error event
Servers
wss
production
wss://api.deepgram.com/v1/listen
Deepgram production WebSocket server for real-time speech-to-text streaming. Connect with query parameters to configure the transcription session.
wss
eu
wss://api.eu.deepgram.com/v1/listen
Deepgram EU WebSocket server for real-time speech-to-text streaming.