Inworld AI · AsyncAPI Specification

Inworld AI Runtime WebSocket APIs

Version 1.0.0

AsyncAPI description of Inworld AI's publicly documented runtime WebSocket surface. Inworld exposes three independent WebSocket endpoints: * **TTS streaming** — bidirectional text-to-speech synthesis with per-context configuration, flush, and close semantics (`/tts/v1/voice:streamBidirectional`). * **STT streaming** — bidirectional speech-to-text transcription that accepts a config message, base64-encoded audio chunks, and end-of-turn signals (`/stt/v1/transcribe:streamBidirectional`). * **Realtime API** — end-to-end speech-to-speech sessions using an OpenAI-Realtime-API-compatible event protocol (`/api/v1/realtime/session`). All three endpoints authenticate with HTTP Basic using a Base64-encoded Inworld API key (Authorization header), per the published documentation. This document is a non-fabricated reconstruction of the message shapes published at https://docs.inworld.ai (TTS WebSocket, STT WebSocket, and Realtime WebSocket reference pages). Field sets reflect the documented JSON payloads; many Realtime server-sent events are documented narratively by Inworld and are modeled here as open objects.

View Spec View on GitHub AIArtificial IntelligenceVoiceText To SpeechSpeech To TextRealtimeLLM RoutingVoice CloningConversational AIGame AIAsyncAPIWebhooksEvents

Channels

/tts/v1/voice:streamBidirectional
publish ttsClientMessage
Messages sent from client to Inworld TTS.
Bidirectional WebSocket endpoint for streaming text-to-speech synthesis. A single connection may host up to five contexts; the account is limited to 20 concurrent connections. The connection auto-closes after 10 minutes of inactivity across all contexts.
/stt/v1/transcribe:streamBidirectional
publish sttClientMessage
Messages sent from client to Inworld STT.
Bidirectional WebSocket endpoint for streaming speech-to-text transcription. The first client message must be a TranscribeConfig followed by AudioChunk messages and (optionally) EndTurn / CloseStream control messages.
/api/v1/realtime/session
publish realtimeClientEvent
Events sent from client to the Realtime session.
Bidirectional WebSocket endpoint for Inworld Realtime sessions. The protocol is intentionally compatible with the OpenAI Realtime API event protocol; existing OpenAI Realtime clients can connect by changing the base URL and authentication header. Inworld extends the session object with a `providerData` field for STT, TTS, memory, backchannel, and responsiveness extensions (see https://docs.inworld.ai/realtime/provider-data).

Messages

TtsCreateContext
Create TTS context
Establishes an independent synthesis stream within the connection (max 5 per connection).
TtsSendText
Send text for synthesis
Sends up to 1000 characters of text to a context.
TtsFlushContext
Flush buffered text
Manually triggers synthesis of buffered text.
TtsCloseContext
Close TTS context
Terminates a context and releases resources, flushing pending text.
TtsContextCreated
Context created (server)
TtsAudioChunk
Audio chunk (server)
TtsFlushCompleted
Flush completed (server)
TtsContextClosed
Context closed (server)
SttTranscribeConfig
Transcribe config (client, first message)
SttAudioChunk
Audio chunk (client)
SttEndTurn
End turn (client)
SttCloseStream
Close stream (client)
SttTranscription
Transcription (server)
SttSpeechStarted
Speech started (server)
SttSpeechStopped
Speech stopped (server)
SttUsage
Usage (server, coming soon)
RealtimeSessionUpdate
session.update (client)
RealtimeConversationItemCreate
conversation.item.create (client)
RealtimeConversationItemTruncate
conversation.item.truncate (client)
RealtimeConversationItemDelete
conversation.item.delete (client)
RealtimeConversationItemRetrieve
conversation.item.retrieve (client)
RealtimeResponseCreate
response.create (client)
RealtimeResponseCancel
response.cancel (client)
RealtimeInputAudioBufferAppend
input_audio_buffer.append (client)
RealtimeInputAudioBufferCommit
input_audio_buffer.commit (client)
RealtimeInputAudioBufferClear
input_audio_buffer.clear (client)
RealtimeOutputAudioBufferClear
output_audio_buffer.clear (client)
RealtimeSessionCreated
session.created (server)
RealtimeSessionUpdated
session.updated (server)
RealtimeGenericServerEvent
Other Realtime server event
Inworld documents additional server events that follow the OpenAI Realtime API protocol — including (non-exhaustive) `response.start`, `response.content_part.added`, `response.content_part.delta`, `response.done`, `conversation.item.created`, `input_audio_buffer.speech_started`, and `input_audio_buffer.speech_stopped`. The exact field set for these events is described narratively in the Inworld docs; they are represented here as a generic typed event so consumers do not assume unverified field shapes.

Servers

wss
production api.inworld.ai
Inworld production WebSocket host. Used for all three runtime WebSocket endpoints (TTS, STT, Realtime).

AsyncAPI Specification

Raw ↑
asyncapi: '2.6.0'
id: 'urn:com:inworld:realtime:asyncapi'
info:
  title: Inworld AI Runtime WebSocket APIs
  version: '1.0.0'
  description: |
    AsyncAPI description of Inworld AI's publicly documented runtime WebSocket
    surface. Inworld exposes three independent WebSocket endpoints:

      * **TTS streaming** — bidirectional text-to-speech synthesis with
        per-context configuration, flush, and close semantics
        (`/tts/v1/voice:streamBidirectional`).
      * **STT streaming** — bidirectional speech-to-text transcription that
        accepts a config message, base64-encoded audio chunks, and end-of-turn
        signals (`/stt/v1/transcribe:streamBidirectional`).
      * **Realtime API** — end-to-end speech-to-speech sessions using an
        OpenAI-Realtime-API-compatible event protocol
        (`/api/v1/realtime/session`).

    All three endpoints authenticate with HTTP Basic using a Base64-encoded
    Inworld API key (Authorization header), per the published documentation.

    This document is a non-fabricated reconstruction of the message shapes
    published at https://docs.inworld.ai (TTS WebSocket, STT WebSocket, and
    Realtime WebSocket reference pages). Field sets reflect the documented
    JSON payloads; many Realtime server-sent events are documented narratively
    by Inworld and are modeled here as open objects.
  termsOfService: https://inworld.ai/legal/terms-of-service
  contact:
    name: Inworld AI Support
    url: https://docs.inworld.ai/tts/resources/support
  license:
    name: Inworld AI Terms of Service
    url: https://inworld.ai/legal/terms-of-service
  x-apis-json-aid: 'inworld-ai'
  x-api-evangelist-source: https://docs.inworld.ai
defaultContentType: application/json
servers:
  production:
    url: api.inworld.ai
    protocol: wss
    description: Inworld production WebSocket host. Used for all three runtime
      WebSocket endpoints (TTS, STT, Realtime).
    security:
      - apiKey: []
channels:
  /tts/v1/voice:streamBidirectional:
    description: |
      Bidirectional WebSocket endpoint for streaming text-to-speech synthesis.
      A single connection may host up to five contexts; the account is limited
      to 20 concurrent connections. The connection auto-closes after 10
      minutes of inactivity across all contexts.
    bindings:
      ws:
        method: GET
        bindingVersion: '0.1.0'
    publish:
      operationId: ttsClientMessage
      summary: Messages sent from client to Inworld TTS.
      message:
        oneOf:
          - $ref: '#/components/messages/TtsCreateContext'
          - $ref: '#/components/messages/TtsSendText'
          - $ref: '#/components/messages/TtsFlushContext'
          - $ref: '#/components/messages/TtsCloseContext'
    subscribe:
      operationId: ttsServerMessage
      summary: Messages streamed from Inworld TTS to the client.
      message:
        oneOf:
          - $ref: '#/components/messages/TtsContextCreated'
          - $ref: '#/components/messages/TtsAudioChunk'
          - $ref: '#/components/messages/TtsFlushCompleted'
          - $ref: '#/components/messages/TtsContextClosed'
  /stt/v1/transcribe:streamBidirectional:
    description: |
      Bidirectional WebSocket endpoint for streaming speech-to-text
      transcription. The first client message must be a TranscribeConfig
      followed by AudioChunk messages and (optionally) EndTurn / CloseStream
      control messages.
    bindings:
      ws:
        method: GET
        bindingVersion: '0.1.0'
    publish:
      operationId: sttClientMessage
      summary: Messages sent from client to Inworld STT.
      message:
        oneOf:
          - $ref: '#/components/messages/SttTranscribeConfig'
          - $ref: '#/components/messages/SttAudioChunk'
          - $ref: '#/components/messages/SttEndTurn'
          - $ref: '#/components/messages/SttCloseStream'
    subscribe:
      operationId: sttServerMessage
      summary: Messages streamed from Inworld STT to the client.
      message:
        oneOf:
          - $ref: '#/components/messages/SttTranscription'
          - $ref: '#/components/messages/SttSpeechStarted'
          - $ref: '#/components/messages/SttSpeechStopped'
          - $ref: '#/components/messages/SttUsage'
  /api/v1/realtime/session:
    description: |
      Bidirectional WebSocket endpoint for Inworld Realtime sessions. The
      protocol is intentionally compatible with the OpenAI Realtime API event
      protocol; existing OpenAI Realtime clients can connect by changing the
      base URL and authentication header. Inworld extends the session object
      with a `providerData` field for STT, TTS, memory, backchannel, and
      responsiveness extensions (see https://docs.inworld.ai/realtime/provider-data).
    parameters:
      key:
        description: Client-supplied session key (the JS quickstart uses
          `voice-${Date.now()}`).
        schema:
          type: string
      protocol:
        description: Protocol identifier; documented value is `realtime`.
        schema:
          type: string
          enum:
            - realtime
    bindings:
      ws:
        method: GET
        query:
          type: object
          properties:
            key:
              type: string
              description: Client-supplied session key.
            protocol:
              type: string
              enum:
                - realtime
        bindingVersion: '0.1.0'
    publish:
      operationId: realtimeClientEvent
      summary: Events sent from client to the Realtime session.
      message:
        oneOf:
          - $ref: '#/components/messages/RealtimeSessionUpdate'
          - $ref: '#/components/messages/RealtimeConversationItemCreate'
          - $ref: '#/components/messages/RealtimeConversationItemTruncate'
          - $ref: '#/components/messages/RealtimeConversationItemDelete'
          - $ref: '#/components/messages/RealtimeConversationItemRetrieve'
          - $ref: '#/components/messages/RealtimeResponseCreate'
          - $ref: '#/components/messages/RealtimeResponseCancel'
          - $ref: '#/components/messages/RealtimeInputAudioBufferAppend'
          - $ref: '#/components/messages/RealtimeInputAudioBufferCommit'
          - $ref: '#/components/messages/RealtimeInputAudioBufferClear'
          - $ref: '#/components/messages/RealtimeOutputAudioBufferClear'
    subscribe:
      operationId: realtimeServerEvent
      summary: Events streamed from the Realtime session back to the client.
      message:
        oneOf:
          - $ref: '#/components/messages/RealtimeSessionCreated'
          - $ref: '#/components/messages/RealtimeSessionUpdated'
          - $ref: '#/components/messages/RealtimeGenericServerEvent'
components:
  securitySchemes:
    apiKey:
      type: httpApiKey
      in: header
      name: Authorization
      description: |
        HTTP Basic authentication using the Base64-encoded Inworld API key
        copied from the Inworld Portal API Keys page. The header value is
        `Basic ${INWORLD_API_KEY}`. The Realtime quickstart sends the header
        on the upgrade request; the TTS and STT docs additionally allow
        passing the same value via the `authorization` query parameter.
  messages:
    # --- TTS messages -------------------------------------------------------
    TtsCreateContext:
      name: tts.create
      title: Create TTS context
      summary: Establishes an independent synthesis stream within the
        connection (max 5 per connection).
      payload:
        $ref: '#/components/schemas/TtsCreateContext'
    TtsSendText:
      name: tts.send_text
      title: Send text for synthesis
      summary: Sends up to 1000 characters of text to a context.
      payload:
        $ref: '#/components/schemas/TtsSendText'
    TtsFlushContext:
      name: tts.flush_context
      title: Flush buffered text
      summary: Manually triggers synthesis of buffered text.
      payload:
        $ref: '#/components/schemas/TtsFlushContext'
    TtsCloseContext:
      name: tts.close_context
      title: Close TTS context
      summary: Terminates a context and releases resources, flushing pending text.
      payload:
        $ref: '#/components/schemas/TtsCloseContext'
    TtsContextCreated:
      name: tts.contextCreated
      title: Context created (server)
      payload:
        $ref: '#/components/schemas/TtsContextCreatedResult'
    TtsAudioChunk:
      name: tts.audioChunk
      title: Audio chunk (server)
      payload:
        $ref: '#/components/schemas/TtsAudioChunkResult'
    TtsFlushCompleted:
      name: tts.flushCompleted
      title: Flush completed (server)
      payload:
        $ref: '#/components/schemas/TtsFlushCompletedResult'
    TtsContextClosed:
      name: tts.contextClosed
      title: Context closed (server)
      payload:
        $ref: '#/components/schemas/TtsContextClosedResult'
    # --- STT messages -------------------------------------------------------
    SttTranscribeConfig:
      name: stt.transcribeConfig
      title: Transcribe config (client, first message)
      payload:
        $ref: '#/components/schemas/SttTranscribeConfig'
    SttAudioChunk:
      name: stt.audioChunk
      title: Audio chunk (client)
      payload:
        $ref: '#/components/schemas/SttAudioChunk'
    SttEndTurn:
      name: stt.endTurn
      title: End turn (client)
      payload:
        type: object
        additionalProperties: false
        description: Empty payload signaling end of speaker turn (no-op for
          providers without manual turn-taking).
    SttCloseStream:
      name: stt.closeStream
      title: Close stream (client)
      payload:
        type: object
        additionalProperties: false
        description: Empty payload signaling no further audio will be sent.
    SttTranscription:
      name: stt.transcription
      title: Transcription (server)
      payload:
        $ref: '#/components/schemas/SttTranscription'
    SttSpeechStarted:
      name: stt.speechStarted
      title: Speech started (server)
      payload:
        $ref: '#/components/schemas/SttSpeechStarted'
    SttSpeechStopped:
      name: stt.speechStopped
      title: Speech stopped (server)
      payload:
        $ref: '#/components/schemas/SttSpeechStopped'
    SttUsage:
      name: stt.usage
      title: Usage (server, coming soon)
      payload:
        $ref: '#/components/schemas/SttUsage'
    # --- Realtime client events --------------------------------------------
    RealtimeSessionUpdate:
      name: session.update
      title: session.update (client)
      payload:
        $ref: '#/components/schemas/RealtimeSessionUpdate'
    RealtimeConversationItemCreate:
      name: conversation.item.create
      title: conversation.item.create (client)
      payload:
        $ref: '#/components/schemas/RealtimeConversationItemCreate'
    RealtimeConversationItemTruncate:
      name: conversation.item.truncate
      title: conversation.item.truncate (client)
      payload:
        $ref: '#/components/schemas/RealtimeConversationItemTruncate'
    RealtimeConversationItemDelete:
      name: conversation.item.delete
      title: conversation.item.delete (client)
      payload:
        $ref: '#/components/schemas/RealtimeConversationItemDelete'
    RealtimeConversationItemRetrieve:
      name: conversation.item.retrieve
      title: conversation.item.retrieve (client)
      payload:
        $ref: '#/components/schemas/RealtimeConversationItemRetrieve'
    RealtimeResponseCreate:
      name: response.create
      title: response.create (client)
      payload:
        $ref: '#/components/schemas/RealtimeResponseCreate'
    RealtimeResponseCancel:
      name: response.cancel
      title: response.cancel (client)
      payload:
        $ref: '#/components/schemas/RealtimeResponseCancel'
    RealtimeInputAudioBufferAppend:
      name: input_audio_buffer.append
      title: input_audio_buffer.append (client)
      payload:
        $ref: '#/components/schemas/RealtimeInputAudioBufferAppend'
    RealtimeInputAudioBufferCommit:
      name: input_audio_buffer.commit
      title: input_audio_buffer.commit (client)
      payload:
        $ref: '#/components/schemas/RealtimeTypedEvent'
    RealtimeInputAudioBufferClear:
      name: input_audio_buffer.clear
      title: input_audio_buffer.clear (client)
      payload:
        $ref: '#/components/schemas/RealtimeTypedEvent'
    RealtimeOutputAudioBufferClear:
      name: output_audio_buffer.clear
      title: output_audio_buffer.clear (client)
      payload:
        $ref: '#/components/schemas/RealtimeTypedEvent'
    # --- Realtime server events --------------------------------------------
    RealtimeSessionCreated:
      name: session.created
      title: session.created (server)
      payload:
        $ref: '#/components/schemas/RealtimeSessionCreated'
    RealtimeSessionUpdated:
      name: session.updated
      title: session.updated (server)
      payload:
        $ref: '#/components/schemas/RealtimeSessionUpdated'
    RealtimeGenericServerEvent:
      name: realtime.server_event
      title: Other Realtime server event
      summary: |
        Inworld documents additional server events that follow the OpenAI
        Realtime API protocol — including (non-exhaustive) `response.start`,
        `response.content_part.added`, `response.content_part.delta`,
        `response.done`, `conversation.item.created`,
        `input_audio_buffer.speech_started`, and
        `input_audio_buffer.speech_stopped`. The exact field set for these
        events is described narratively in the Inworld docs; they are
        represented here as a generic typed event so consumers do not assume
        unverified field shapes.
      payload:
        $ref: '#/components/schemas/RealtimeGenericServerEvent'
  schemas:
    # --- TTS schemas --------------------------------------------------------
    TtsAudioConfig:
      type: object
      properties:
        audioEncoding:
          type: string
          description: Audio encoding for synthesized output.
          enum:
            - LINEAR16
            - MP3
            - OGG_OPUS
            - ALAW
            - MULAW
            - PCM
            - WAV
        sampleRateHertz:
          type: integer
          description: Output sample rate (8000–48000 Hz).
          minimum: 8000
          maximum: 48000
        bitRate:
          type: integer
          description: Bit rate (encoding-dependent).
        speakingRate:
          type: number
          description: Speaking rate multiplier (0.5–1.5).
          minimum: 0.5
          maximum: 1.5
    TtsCreateContext:
      type: object
      required:
        - create
      properties:
        contextId:
          type: string
          description: Client-supplied context identifier. Auto-generated if omitted.
        create:
          type: object
          required:
            - voiceId
            - modelId
          properties:
            voiceId:
              type: string
            modelId:
              type: string
              description: e.g. `inworld-tts-2`, `inworld-tts-1-max`, `inworld-tts-1-mini`.
            audioConfig:
              $ref: '#/components/schemas/TtsAudioConfig'
            temperature:
              type: number
            timestampType:
              type: string
              enum:
                - WORD
                - CHARACTER
            maxBufferDelayMs:
              type: integer
              description: Maximum ms to buffer text before auto-flush.
            bufferCharThreshold:
              type: integer
              description: Character threshold (default/max 1000) for auto-flush.
            applyTextNormalization:
              type: string
              enum:
                - ON
                - OFF
            autoMode:
              type: boolean
            timestampTransportStrategy:
              type: string
              enum:
                - SYNC
                - ASYNC
            language:
              type: string
            deliveryMode:
              type: string
              enum:
                - BALANCED
                - FAST
                - QUALITY
    TtsSendText:
      type: object
      required:
        - send_text
      properties:
        contextId:
          type: string
        send_text:
          type: object
          required:
            - text
          properties:
            text:
              type: string
              maxLength: 1000
              description: Up to 1000 characters of text to synthesize.
            flush_context:
              type: object
              description: Empty object that triggers immediate synthesis.
    TtsFlushContext:
      type: object
      required:
        - flush_context
      properties:
        contextId:
          type: string
        flush_context:
          type: object
          additionalProperties: false
    TtsCloseContext:
      type: object
      required:
        - close_context
      properties:
        contextId:
          type: string
        close_context:
          type: object
          additionalProperties: false
    GrpcStatus:
      type: object
      description: gRPC status object accompanying every TTS server result.
      properties:
        code:
          type: integer
          description: gRPC status code. 0 = OK; non-zero indicates failure.
        message:
          type: string
        details:
          type: array
          items:
            type: object
    TtsContextCreatedResult:
      type: object
      properties:
        result:
          type: object
          properties:
            contextId:
              type: string
            contextCreated:
              type: object
              properties:
                voiceId: { type: string }
                modelId: { type: string }
                audioConfig:
                  $ref: '#/components/schemas/TtsAudioConfig'
                timestampType:
                  type: string
                autoMode:
                  type: boolean
                language:
                  type: string
                deliveryMode:
                  type: string
            status:
              $ref: '#/components/schemas/GrpcStatus'
    TtsAudioChunkResult:
      type: object
      properties:
        result:
          type: object
          properties:
            contextId:
              type: string
            audioChunk:
              type: object
              properties:
                audioContent:
                  type: string
                  format: byte
                  description: Base64-encoded audio bytes.
                usage:
                  type: object
                  properties:
                    processedCharactersCount:
                      type: integer
                    modelId:
                      type: string
                timestampInfo:
                  type: object
                  properties:
                    wordAlignment:
                      type: object
                      properties:
                        words:
                          type: array
                          items:
                            type: string
                        wordStartTimeSeconds:
                          type: array
                          items:
                            type: number
                        wordEndTimeSeconds:
                          type: array
                          items:
                            type: number
                    characterAlignment:
                      type: object
                      description: Present when timestampType=CHARACTER.
                status:
                  $ref: '#/components/schemas/GrpcStatus'
    TtsFlushCompletedResult:
      type: object
      properties:
        result:
          type: object
          properties:
            contextId:
              type: string
            flushCompleted:
              type: object
              additionalProperties: false
            status:
              $ref: '#/components/schemas/GrpcStatus'
    TtsContextClosedResult:
      type: object
      properties:
        result:
          type: object
          properties:
            contextId:
              type: string
            contextClosed:
              type: object
              additionalProperties: false
            status:
              $ref: '#/components/schemas/GrpcStatus'
    # --- STT schemas --------------------------------------------------------
    SttTranscribeConfig:
      type: object
      required:
        - modelId
        - audioEncoding
      properties:
        modelId:
          type: string
          enum:
            - inworld/inworld-stt-1
            - assemblyai/universal-streaming-multilingual
            - assemblyai/universal-streaming-english
            - assemblyai/u3-rt-pro
            - assemblyai/whisper-rt
            - soniox/stt-rt-v4
        audioEncoding:
          type: string
          enum:
            - AUTO_DETECT
            - LINEAR16
            - MP3
            - OGG_OPUS
            - FLAC
        sampleRateHertz:
          type: integer
        language:
          type: string
        numberOfChannels:
          type: integer
        inactivityTimeoutSeconds:
          type: integer
        endOfTurnConfidenceThreshold:
          type: number
        prompts:
          type: array
          items:
            type: string
        includeWordTimestamps:
          type: boolean
        assemblyaiConfig:
          type: object
          properties:
            vadThreshold:
              type: number
            minEndOfTurnSilenceWhenConfident:
              type: integer
            maxTurnSilence:
              type: integer
            prompt:
              type: string
        inworldSttV1Config:
          type: object
          properties:
            vadThreshold:
              type: number
            minEndOfTurnSilenceWhenConfident:
              type: integer
        sonioxConfig:
          type: object
          properties:
            languageHints:
              type: array
              items:
                type: string
            languageHintsStrict:
              type: boolean
            enableEndpointDetection:
              type: boolean
            maxEndpointDelayMs:
              type: integer
            context:
              type: object
              properties:
                general:
                  type: object
                text:
                  type: string
                terms:
                  type: array
                  items:
                    type: string
        groqConfig:
          type: object
          properties:
            temperature:
              type: number
        voiceProfileConfig:
          type: object
          properties:
            enableVoiceProfile:
              type: boolean
            topN:
              type: integer
    SttAudioChunk:
      type: object
      required:
        - content
      properties:
        content:
          type: string
          format: byte
          description: Base64-encoded audio bytes matching the negotiated encoding.
    SttTranscription:
      type: object
      properties:
        transcript:
          type: string
        isFinal:
          type: boolean
        wordTimestamps:
          type: array
          items:
            type: object
            properties:
              word:
                type: string
              confidence:
                type: number
                minimum: 0
                maximum: 1
              startTimeMs:
                type: integer
              endTimeMs:
                type: integer
    SttSpeechStarted:
      type: object
      properties:
        startTimeMs:
          type: integer
        confidence:
          type: number
          minimum: 0
          maximum: 1
    SttSpeechStopped:
      type: object
      properties:
        silenceDurationMs:
          type: integer
    SttUsage:
      type: object
      properties:
        transcribedAudioMs:
          type: integer
        modelId:
          type: string
    # --- Realtime schemas ---------------------------------------------------
    RealtimeTypedEvent:
      type: object
      required:
        - type
      properties:
        type:
          type: string
          description: Event type identifier (e.g. `input_audio_buffer.commit`).
        event_id:
          type: string
          description: Optional client-supplied event id; echoed by some events.
    RealtimeAudioFormat:
      type: object
      properties:
        type:
          type: string
          enum:
            - audio/pcm
            - audio/pcmu
            - audio/pcma
            - audio/float32
        rate:
          type: integer
    RealtimeSessionAudioInput:
      type: object
      properties:
        format:
          $ref: '#/components/schemas/RealtimeAudioFormat'
        noise_reduction:
          type: object
          properties:
            type:
              type: string
              enum:
                - near_field
                - far_field
        transcription:
          type: object
          properties:
            model:
              type: string
            language:
              type: string
            prompt:
              type: string
        turn_detection:
          type: object
          properties:
            type:
              type: string
              enum:
                - server_vad
                - semantic_vad
            threshold:
              type: number
            eagerness:
              type: string
            create_response:
              type: boolean
            interrupt_response:
              type: boolean
    RealtimeSessionAudioOutput:
      type: object
      properties:
        format:
          $ref: '#/components/schemas/RealtimeAudioFormat'
        voice:
          type: string
        model:
          type: string
        speed:
          type: number
          minimum: 0.25
          maximum: 1.5
    RealtimeSession:
      type: object
      properties:
        id:
          type: string
        object:
          type: string
        expires_at:
          type: integer
        model:
          type: string
        instructions:
          type: string
        output_modalities:
          type: array
          items:
            type: string
            enum:
              - text
              - audio
        temperature:
          type: number
        max_output_tokens:
          oneOf:
            - type: integer
              minimum: 1
              maximum: 4096
            - type: string
              enum:
                - inf
        audio:
          type: object
          properties:
            input:
              $ref: '#/components/schemas/RealtimeSessionAudioInput'
            output:
              $ref: '#/components/schemas/RealtimeSessionAudioOutput'
        tools:
          type: array
          items:
            type: object
        tool_choice:
          oneOf:
            - type: string
            - type: object
        truncation:
          oneOf:
            - type: string
            - type: object
        providerData:
          type: object
          description: |
            Inworld-specific extensions. Documented sub-objects include
            transcription, segmentation, memory, backchannel, and
            responsiveness. See https://docs.inworld.ai/realtime/provider-data.
    RealtimeSessionUpdate:
      allOf:
        - $ref: '#/components/schemas/RealtimeTypedEvent'
        - type: object
          required:
            - session
          properties:
            type:
              type: string
              enum:
                - session.update
            session:
              $ref: '#/components/schemas/RealtimeSession'
    RealtimeContentPart:
      type: object
      required:
        - type
      properties:
        type:
          type: string
          enum:
            - input_text
            - input_audio
            - text
            - audio
        text:
          type: string
        audio:
          type: string
          format: byte
          description: Base64-encoded audio.
        transcript:
          type: string
    RealtimeConversationItem:
      type: object
      required:
        - type
      properties:
        id:
          type: string
        type:
          type: string
          description: e.g. `message`, `function_call_result`.
        role:
          type: string
          enum:
            - system
            - user
            - assistant
            - tool
        content:
          type: array
          items:
            $ref: '#/components/schemas/RealtimeContentPart'
    RealtimeConversationItemCreate:
      allOf:
        - $ref: '#/components/schemas/RealtimeTypedEvent'
        - type: object
          required:
            - item
          properties:
            type:
              type: string
              enum:
                - conversation.item.create
            item:
              $ref: '#/components/schemas/RealtimeConversationItem'
    RealtimeConversationItemTruncate:
      allOf:
        - $ref: '#/components/schemas/RealtimeTypedEvent'
        - type: object
          required:
            - item_id
            - content_index
            - audio_end_ms
          properties:
            type:
              type: string
              enum:
                - conversation.item.truncate
            item_id:
              type: string
            content_index:
              type: integer
            audio_end_ms:
              type: integer
    RealtimeConversationItemDelete:
      allOf:
        - $ref: '#/components/schemas/RealtimeTypedEvent'
        - type: object
          required:
            - item_id
          properties:
            type:
              type: string
              enum:
                - conversation.item.delete
            item_id:
              type: string
    RealtimeConversationItemRetrieve:
      allOf:
        - $ref: '#/components/schemas/RealtimeTypedEvent'
        - type: object
          required:
            - item_id
          properties:
            type:
              type: string
              enum:
                - conversation.item.retrieve
            item_id:
              type: string
    RealtimeResponseCreate:
      allOf:
        - $ref: '#/components/schemas/RealtimeTypedEvent'
        - type: object
          properties:
            type:
              type: string
              enum:
                - response.create
            response:
              type: object
              properties:
                conversation:
                  oneOf:
                    - type: string
                      enum:
                        - auto
                    - type: string
                output_modalities:
                  type: array
                  items:
                    type: string
                    enum:
                      - text
                      - audio
                instructions:
                  type: string
                voice:
                  type: string
                max_output_tokens:
                  oneOf:
                    - type: integer
                    - type: string
                      enum:
                        - inf
                tool_choice:
                  oneOf:
                    - type: string
                    - type: object
                tools:
                  type: array
                  items:
                    type: object
    RealtimeResponseCancel:
      allOf:
        - $ref: '#/components/schemas/RealtimeTypedEvent'
        - type: object
          properties:
            type:
              type: string
              enum:
                - response.cancel
            response_id:
              type: string
              description: Specific response to cancel; omit to cancel active response.
    RealtimeInputAudioBufferAppend:
      allOf:
        - $ref: '#/components/schemas/RealtimeTypedEvent'
        - type: object
          required:
            - audio
          properties:
            type:
              type: string
              enum:
                - input_audio_buffer.append
            audio:
              type: string
              format: byte
              description: Base64-encoded audio chunk (100–200 ms recommended).
    RealtimeSessionCreated:
      allOf:
        - $ref: '#/components/schemas/RealtimeTypedEvent'
        - type: object
          properties:
            type:
              type: string
              enum:
                - session.created
            s

# --- truncated at 32 KB (33 KB total) ---
# Full source: https://raw.githubusercontent.com/api-evangelist/inworld-ai/refs/heads/main/asyncapi/inworld-ai-asyncapi.yml