Deepgram · AsyncAPI Specification

Deepgram Speech-to-Text Streaming Events

Version 1.0

The Deepgram Speech-to-Text streaming API provides real-time transcription of audio using a WebSocket connection. Audio data is sent as binary WebSocket messages and transcription results are returned as JSON messages in real-time, supporting interim results, final results, speaker diarization, and speech detection events. The API supports the same model family and feature parameters as the pre-recorded API.

View Spec View on GitHub Artificial IntelligenceSpeech-To-TextText-To-SpeechTranscriptionVoice AIAsyncAPIWebhooksEvents

Channels

/v1/listen

publish sendAudioData

Send audio data for real-time transcription

WebSocket channel for real-time speech-to-text streaming. The client sends binary audio frames and receives JSON transcription events. Connection parameters include model, language, punctuate, diarize, smart_format, interim_results, utterance_end_ms, vad_events, and encoding options.

Messages

✉

AudioFrame

Audio Frame

Binary audio data frame

✉

CloseStream

Close Stream

Signal to close the audio stream

✉

KeepAlive

Keep Alive

Keep the connection alive

✉

TranscriptResult

Transcript Result

Real-time transcription result

✉

SpeechStarted

Speech Started

Speech activity detected

✉

UtteranceEnd

Utterance End

End of utterance detected

✉

StreamMetadata

Stream Metadata

Stream metadata information

✉

StreamError

Stream Error

Stream error event

Servers

wss

production wss://api.deepgram.com/v1/listen

Deepgram production WebSocket server for real-time speech-to-text streaming. Connect with query parameters to configure the transcription session.

wss

eu wss://api.eu.deepgram.com/v1/listen

Deepgram EU WebSocket server for real-time speech-to-text streaming.

AsyncAPI Specification

asyncapi: 2.6.0
info:
  title: Deepgram Speech-to-Text Streaming Events
  description: >-
    The Deepgram Speech-to-Text streaming API provides real-time transcription
    of audio using a WebSocket connection. Audio data is sent as binary
    WebSocket messages and transcription results are returned as JSON messages
    in real-time, supporting interim results, final results, speaker
    diarization, and speech detection events. The API supports the same model
    family and feature parameters as the pre-recorded API.
  version: '1.0'
  contact:
    name: Deepgram Support
    url: https://developers.deepgram.com
servers:
  production:
    url: 'wss://api.deepgram.com/v1/listen'
    protocol: wss
    description: >-
      Deepgram production WebSocket server for real-time speech-to-text
      streaming. Connect with query parameters to configure the transcription
      session.
    security:
      - bearerAuth: []
  eu:
    url: 'wss://api.eu.deepgram.com/v1/listen'
    protocol: wss
    description: >-
      Deepgram EU WebSocket server for real-time speech-to-text streaming.
    security:
      - bearerAuth: []
channels:
  /v1/listen:
    description: >-
      WebSocket channel for real-time speech-to-text streaming. The client
      sends binary audio frames and receives JSON transcription events.
      Connection parameters include model, language, punctuate, diarize,
      smart_format, interim_results, utterance_end_ms, vad_events, and
      encoding options.
    publish:
      operationId: sendAudioData
      summary: Send audio data for real-time transcription
      description: >-
        Client sends binary audio data frames to the WebSocket connection.
        Audio should be sent as binary WebSocket messages. Send a JSON close
        message to signal end of audio stream.
      message:
        oneOf:
          - $ref: '#/components/messages/AudioFrame'
          - $ref: '#/components/messages/CloseStream'
          - $ref: '#/components/messages/KeepAlive'
    subscribe:
      operationId: receiveTranscriptionEvents
      summary: Receive transcription events
      description: >-
        Server sends JSON messages containing transcription results, metadata,
        and stream lifecycle events.
      message:
        oneOf:
          - $ref: '#/components/messages/TranscriptResult'
          - $ref: '#/components/messages/SpeechStarted'
          - $ref: '#/components/messages/UtteranceEnd'
          - $ref: '#/components/messages/StreamMetadata'
          - $ref: '#/components/messages/StreamError'
components:
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: >-
        Deepgram API key passed as a token query parameter or Authorization
        header when establishing the WebSocket connection.
  messages:
    AudioFrame:
      name: AudioFrame
      title: Audio Frame
      summary: Binary audio data frame
      description: >-
        Raw binary audio data sent as a WebSocket binary message. The audio
        encoding format should be specified via connection query parameters.
      contentType: application/octet-stream
      payload:
        type: string
        format: binary
        description: >-
          Raw binary audio data in the configured encoding format.
    CloseStream:
      name: CloseStream
      title: Close Stream
      summary: Signal to close the audio stream
      description: >-
        JSON message sent by the client to signal the end of the audio
        stream, triggering final processing of any remaining audio.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/CloseStreamPayload'
    KeepAlive:
      name: KeepAlive
      title: Keep Alive
      summary: Keep the connection alive
      description: >-
        JSON message sent by the client to keep the WebSocket connection
        alive during periods of silence without closing the stream.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/KeepAlivePayload'
    TranscriptResult:
      name: TranscriptResult
      title: Transcript Result
      summary: Real-time transcription result
      description: >-
        JSON message containing transcription results. Can be an interim
        result (is_final=false) or a final result (is_final=true) depending
        on the interim_results connection parameter.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/TranscriptResultPayload'
    SpeechStarted:
      name: SpeechStarted
      title: Speech Started
      summary: Speech activity detected
      description: >-
        Event indicating that speech activity has been detected in the
        audio stream. Sent when vad_events is enabled.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/SpeechStartedPayload'
    UtteranceEnd:
      name: UtteranceEnd
      title: Utterance End
      summary: End of utterance detected
      description: >-
        Event indicating that the end of an utterance has been detected
        based on the configured utterance_end_ms threshold.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/UtteranceEndPayload'
    StreamMetadata:
      name: StreamMetadata
      title: Stream Metadata
      summary: Stream metadata information
      description: >-
        Metadata about the streaming session including request ID, model
        information, and session configuration.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/StreamMetadataPayload'
    StreamError:
      name: StreamError
      title: Stream Error
      summary: Stream error event
      description: >-
        Error event indicating an issue with the streaming session.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/StreamErrorPayload'
  schemas:
    CloseStreamPayload:
      type: object
      required:
        - type
      properties:
        type:
          type: string
          const: CloseStream
          description: >-
            Message type identifier.
    KeepAlivePayload:
      type: object
      required:
        - type
      properties:
        type:
          type: string
          const: KeepAlive
          description: >-
            Message type identifier.
    TranscriptResultPayload:
      type: object
      properties:
        type:
          type: string
          const: Results
          description: >-
            Message type identifier.
        channel_index:
          type: array
          items:
            type: integer
          description: >-
            Channel index information.
        duration:
          type: number
          format: float
          description: >-
            Duration of audio processed in seconds.
        start:
          type: number
          format: float
          description: >-
            Start time of this result in seconds.
        is_final:
          type: boolean
          description: >-
            Whether this is a final or interim result.
        speech_final:
          type: boolean
          description: >-
            Whether the speech endpoint has been detected.
        channel:
          type: object
          properties:
            alternatives:
              type: array
              items:
                $ref: '#/components/schemas/StreamAlternative'
              description: >-
                Alternative transcriptions ordered by confidence.
          description: >-
            Channel transcription data.
    StreamAlternative:
      type: object
      properties:
        transcript:
          type: string
          description: >-
            Transcript text for this alternative.
        confidence:
          type: number
          format: float
          description: >-
            Confidence score for this alternative.
          minimum: 0
          maximum: 1
        words:
          type: array
          items:
            $ref: '#/components/schemas/StreamWord'
          description: >-
            Individual words with timing information.
    StreamWord:
      type: object
      properties:
        word:
          type: string
          description: >-
            The transcribed word.
        start:
          type: number
          format: float
          description: >-
            Start time of the word in seconds.
        end:
          type: number
          format: float
          description: >-
            End time of the word in seconds.
        confidence:
          type: number
          format: float
          description: >-
            Confidence score for this word.
        speaker:
          type: integer
          description: >-
            Speaker identifier when diarization is enabled.
        punctuated_word:
          type: string
          description: >-
            The word with punctuation applied.
    SpeechStartedPayload:
      type: object
      properties:
        type:
          type: string
          const: SpeechStarted
          description: >-
            Message type identifier.
        channel:
          type: array
          items:
            type: integer
          description: >-
            Channel indices where speech was detected.
        timestamp:
          type: number
          format: float
          description: >-
            Timestamp in seconds when speech was detected.
    UtteranceEndPayload:
      type: object
      properties:
        type:
          type: string
          const: UtteranceEnd
          description: >-
            Message type identifier.
        channel:
          type: array
          items:
            type: integer
          description: >-
            Channel indices for the utterance.
        last_word_end:
          type: number
          format: float
          description: >-
            Timestamp in seconds of the last word in the utterance.
    StreamMetadataPayload:
      type: object
      properties:
        type:
          type: string
          const: Metadata
          description: >-
            Message type identifier.
        transaction_key:
          type: string
          description: >-
            Transaction key for this session.
        request_id:
          type: string
          description: >-
            Unique request identifier for this session.
        sha256:
          type: string
          description: >-
            SHA-256 hash identifier.
        created:
          type: string
          format: date-time
          description: >-
            Timestamp when the session was created.
        duration:
          type: number
          format: float
          description: >-
            Total duration of audio processed.
        channels:
          type: integer
          description: >-
            Number of audio channels.
        models:
          type: array
          items:
            type: string
          description: >-
            Model identifiers used for transcription.
        model_info:
          type: object
          additionalProperties: true
          description: >-
            Detailed model information.
    StreamErrorPayload:
      type: object
      properties:
        type:
          type: string
          const: Error
          description: >-
            Message type identifier.
        description:
          type: string
          description: >-
            Human-readable error description.
        message:
          type: string
          description: >-
            Error message.
        variant:
          type: string
          description: >-
            Error variant classifier.