Hume AI · AsyncAPI Specification

Hume AI WebSocket APIs

Version 1.0.0

Consolidated AsyncAPI definition for Hume AI's two production WebSocket surfaces: - **Empathic Voice Interface (EVI)** — bidirectional speech-to-speech voice conversation at `wss://api.hume.ai/v0/evi/chat`, plus a read/write secondary connection at `wss://api.hume.ai/v0/evi/chat/{chat_id}/connect`. - **Expression Measurement (Stream)** — streaming multimodal emotion inference at `wss://api.hume.ai/v0/stream/models` over face, prosody, language and burst models. Message names, payload field names and `type` discriminator values are taken from Hume's own published AsyncAPI documents at https://dev.hume.ai/asyncapi/speech-to-speech-evi.yaml and https://dev.hume.ai/asyncapi/expression-measurement-api.yaml.

View Spec View on GitHub AIVoiceEmpathicEmotionMultimodalAsyncAPIWebhooksEvents

Channels

/chat

publish eviChatSend

Messages the client sends to EVI.

Real-time EVI chat. Client sends audio and control messages; server streams transcripts, assistant text, synthesized audio and tool events. Connection URL: `wss://api.hume.ai/v0/evi/chat`.

/chat/{chat_id}/connect

publish eviChatConnectSend

Control-plane messages the secondary client sends to EVI.

Secondary connection to an in-progress EVI chat. The original chat must have been opened with `allow_connection=true`. The secondary connection can send the same control-plane messages as `/chat` except `audio_input`, and receives the same subscribe events. Connection URL: `wss://api.hume.ai/v0/evi/chat/{chat_id}/connect`.

/models

publish streamModelsSend

Streaming inference request from the client.

Streaming multimodal expression measurement inference. Connection URL: `wss://api.hume.ai/v0/stream/models`. Each client message includes a `models` configuration and the `data` payload (Base64-encoded media or raw text). Hume returns a per-model predictions envelope, an error envelope, or a warning.

Messages

✉

AudioInput

Audio Input

Base64-encoded audio chunk treated as user speech.

✉

SessionSettings

Session Settings

Configure session-level parameters such as audio encoding, context, language model, tools and variables.

✉

UserInput

User Input

Plain text inserted into the conversation as the user.

✉

AssistantInput

Assistant Input

Plain text the assistant should synthesize and speak.

✉

PauseAssistantMessage

Pause Assistant Message

Pause assistant responses while still recording user audio.

✉

ResumeAssistantMessage

Resume Assistant Message

Resume assistant responses after a pause.

✉

ChatMetadata

Chat Metadata

Sent once at the start of a connection with chat and chat-group identifiers.

✉

UserMessage

User Message

Transcript and prosody scores for a user utterance.

✉

AssistantMessage

Assistant Message

A piece of generated assistant text returned by the language model.

✉

AssistantProsody

Assistant Prosody

Predicted expression scores for an assistant utterance.

✉

AudioOutput

Audio Output

Base64-encoded chunk of synthesized assistant audio.

✉

AssistantEnd

Assistant End

Marks the end of an assistant turn.

✉

UserInterruption

User Interruption

Signals that the user started speaking and EVI interrupted itself.

✉

ToolCallMessage

Tool Call

Request from EVI to invoke a registered tool.

✉

ToolResponseMessage

Tool Response

Successful response to a tool call.

✉

ToolErrorMessage

Tool Error

Error response to a tool call.

✉

WebSocketError

WebSocket Error

WebSocket-level error emitted by the EVI server.

✉

ModelsInput

Models Input

Streaming inference request - models config + media payload.

✉

ModelsSuccess

Models Success

Per-model predictions for a streamed input.

✉

ModelsError

Models Error

Error returned by the streaming inference server.

✉

ModelsWarning

Models Warning

Non-fatal warning returned by the streaming inference server.

Servers

wss

evi wss://api.hume.ai/v0/evi

Empathic Voice Interface (EVI) WebSocket server.

wss

stream wss://api.hume.ai/v0/stream

Expression Measurement streaming inference WebSocket server.

AsyncAPI Specification

asyncapi: 2.6.0
info:
  title: Hume AI WebSocket APIs
  version: 1.0.0
  description: |
    Consolidated AsyncAPI definition for Hume AI's two production WebSocket
    surfaces:

    - **Empathic Voice Interface (EVI)** — bidirectional speech-to-speech
      voice conversation at `wss://api.hume.ai/v0/evi/chat`, plus a
      read/write secondary connection at `wss://api.hume.ai/v0/evi/chat/{chat_id}/connect`.
    - **Expression Measurement (Stream)** — streaming multimodal emotion
      inference at `wss://api.hume.ai/v0/stream/models` over face, prosody,
      language and burst models.

    Message names, payload field names and `type` discriminator values are
    taken from Hume's own published AsyncAPI documents at
    https://dev.hume.ai/asyncapi/speech-to-speech-evi.yaml and
    https://dev.hume.ai/asyncapi/expression-measurement-api.yaml.
  contact:
    name: Hume AI Developer Platform
    url: https://dev.hume.ai/
  license:
    name: Proprietary - Hume AI Terms of Service
    url: https://www.hume.ai/terms-of-service

servers:
  evi:
    url: wss://api.hume.ai/v0/evi
    protocol: wss
    description: Empathic Voice Interface (EVI) WebSocket server.
    security:
      - apiKey: []
      - accessToken: []
  stream:
    url: wss://api.hume.ai/v0/stream
    protocol: wss
    description: Expression Measurement streaming inference WebSocket server.
    security:
      - humeApiKeyHeader: []

channels:
  /chat:
    description: |
      Real-time EVI chat. Client sends audio and control messages; server
      streams transcripts, assistant text, synthesized audio and tool events.
      Connection URL: `wss://api.hume.ai/v0/evi/chat`.
    bindings:
      ws:
        query:
          type: object
          properties:
            access_token:
              type: string
              description: Short-lived access token (Bearer).
            api_key:
              type: string
              description: Hume API key (alternative to access_token).
            config_id:
              type: string
              description: ID of the EVI configuration to use.
            config_version:
              type: integer
              description: Specific version of the EVI configuration to use.
            event_limit:
              type: integer
              description: Maximum number of events to return for this chat session.
            resumed_chat_group_id:
              type: string
              description: ID of an existing chat group to resume.
            verbose_transcription:
              type: boolean
              default: false
              description: When true, emits interim transcription updates.
            allow_connection:
              type: boolean
              default: false
              description: When true, allows a secondary client to connect to this chat via `/chat/{chat_id}/connect`.
    publish:
      operationId: eviChatSend
      summary: Messages the client sends to EVI.
      message:
        oneOf:
          - $ref: '#/components/messages/AudioInput'
          - $ref: '#/components/messages/SessionSettings'
          - $ref: '#/components/messages/UserInput'
          - $ref: '#/components/messages/AssistantInput'
          - $ref: '#/components/messages/ToolResponseMessage'
          - $ref: '#/components/messages/ToolErrorMessage'
          - $ref: '#/components/messages/PauseAssistantMessage'
          - $ref: '#/components/messages/ResumeAssistantMessage'
    subscribe:
      operationId: eviChatReceive
      summary: Messages EVI streams back to the client.
      message:
        oneOf:
          - $ref: '#/components/messages/ChatMetadata'
          - $ref: '#/components/messages/UserMessage'
          - $ref: '#/components/messages/AssistantMessage'
          - $ref: '#/components/messages/AssistantProsody'
          - $ref: '#/components/messages/AudioOutput'
          - $ref: '#/components/messages/AssistantEnd'
          - $ref: '#/components/messages/UserInterruption'
          - $ref: '#/components/messages/ToolCallMessage'
          - $ref: '#/components/messages/ToolResponseMessage'
          - $ref: '#/components/messages/ToolErrorMessage'
          - $ref: '#/components/messages/WebSocketError'

  /chat/{chat_id}/connect:
    description: |
      Secondary connection to an in-progress EVI chat. The original chat
      must have been opened with `allow_connection=true`. The secondary
      connection can send the same control-plane messages as `/chat`
      except `audio_input`, and receives the same subscribe events.
      Connection URL: `wss://api.hume.ai/v0/evi/chat/{chat_id}/connect`.
    parameters:
      chat_id:
        description: The ID of the chat to connect to.
        schema:
          type: string
    bindings:
      ws:
        query:
          type: object
          properties:
            access_token:
              type: string
    publish:
      operationId: eviChatConnectSend
      summary: Control-plane messages the secondary client sends to EVI.
      message:
        oneOf:
          - $ref: '#/components/messages/SessionSettings'
          - $ref: '#/components/messages/UserInput'
          - $ref: '#/components/messages/AssistantInput'
          - $ref: '#/components/messages/ToolResponseMessage'
          - $ref: '#/components/messages/ToolErrorMessage'
          - $ref: '#/components/messages/PauseAssistantMessage'
          - $ref: '#/components/messages/ResumeAssistantMessage'
    subscribe:
      operationId: eviChatConnectReceive
      summary: Events streamed to the secondary client.
      message:
        oneOf:
          - $ref: '#/components/messages/ChatMetadata'
          - $ref: '#/components/messages/UserMessage'
          - $ref: '#/components/messages/AssistantMessage'
          - $ref: '#/components/messages/AssistantProsody'
          - $ref: '#/components/messages/AudioOutput'
          - $ref: '#/components/messages/AssistantEnd'
          - $ref: '#/components/messages/UserInterruption'
          - $ref: '#/components/messages/ToolCallMessage'
          - $ref: '#/components/messages/ToolResponseMessage'
          - $ref: '#/components/messages/ToolErrorMessage'
          - $ref: '#/components/messages/WebSocketError'

  /models:
    description: |
      Streaming multimodal expression measurement inference.
      Connection URL: `wss://api.hume.ai/v0/stream/models`.
      Each client message includes a `models` configuration and the
      `data` payload (Base64-encoded media or raw text). Hume returns
      a per-model predictions envelope, an error envelope, or a warning.
    bindings:
      ws:
        headers:
          type: object
          properties:
            X-Hume-Api-Key:
              type: string
              description: Hume API key used to authenticate the stream.
    publish:
      operationId: streamModelsSend
      summary: Streaming inference request from the client.
      message:
        $ref: '#/components/messages/ModelsInput'
    subscribe:
      operationId: streamModelsReceive
      summary: Streaming inference response from the server.
      message:
        oneOf:
          - $ref: '#/components/messages/ModelsSuccess'
          - $ref: '#/components/messages/ModelsError'
          - $ref: '#/components/messages/ModelsWarning'

components:
  securitySchemes:
    apiKey:
      type: apiKey
      in: query
      name: api_key
      description: Hume API key supplied as a query parameter.
    accessToken:
      type: apiKey
      in: query
      name: access_token
      description: Short-lived access token supplied as a query parameter.
    humeApiKeyHeader:
      type: apiKey
      in: header
      name: X-Hume-Api-Key
      description: Hume API key supplied as a connection header.

  messages:
    # ---------- EVI client-sent (publish) ----------
    AudioInput:
      name: audio_input
      title: Audio Input
      summary: Base64-encoded audio chunk treated as user speech.
      payload:
        $ref: '#/components/schemas/AudioInput'
    SessionSettings:
      name: session_settings
      title: Session Settings
      summary: Configure session-level parameters such as audio encoding, context, language model, tools and variables.
      payload:
        $ref: '#/components/schemas/SessionSettings'
    UserInput:
      name: user_input
      title: User Input
      summary: Plain text inserted into the conversation as the user.
      payload:
        $ref: '#/components/schemas/UserInput'
    AssistantInput:
      name: assistant_input
      title: Assistant Input
      summary: Plain text the assistant should synthesize and speak.
      payload:
        $ref: '#/components/schemas/AssistantInput'
    PauseAssistantMessage:
      name: pause_assistant_message
      title: Pause Assistant Message
      summary: Pause assistant responses while still recording user audio.
      payload:
        $ref: '#/components/schemas/PauseAssistantMessage'
    ResumeAssistantMessage:
      name: resume_assistant_message
      title: Resume Assistant Message
      summary: Resume assistant responses after a pause.
      payload:
        $ref: '#/components/schemas/ResumeAssistantMessage'

    # ---------- EVI server-sent (subscribe) ----------
    ChatMetadata:
      name: chat_metadata
      title: Chat Metadata
      summary: Sent once at the start of a connection with chat and chat-group identifiers.
      payload:
        $ref: '#/components/schemas/ChatMetadata'
    UserMessage:
      name: user_message
      title: User Message
      summary: Transcript and prosody scores for a user utterance.
      payload:
        $ref: '#/components/schemas/UserMessage'
    AssistantMessage:
      name: assistant_message
      title: Assistant Message
      summary: A piece of generated assistant text returned by the language model.
      payload:
        $ref: '#/components/schemas/AssistantMessage'
    AssistantProsody:
      name: assistant_prosody
      title: Assistant Prosody
      summary: Predicted expression scores for an assistant utterance.
      payload:
        $ref: '#/components/schemas/AssistantProsody'
    AudioOutput:
      name: audio_output
      title: Audio Output
      summary: Base64-encoded chunk of synthesized assistant audio.
      payload:
        $ref: '#/components/schemas/AudioOutput'
    AssistantEnd:
      name: assistant_end
      title: Assistant End
      summary: Marks the end of an assistant turn.
      payload:
        $ref: '#/components/schemas/AssistantEnd'
    UserInterruption:
      name: user_interruption
      title: User Interruption
      summary: Signals that the user started speaking and EVI interrupted itself.
      payload:
        $ref: '#/components/schemas/UserInterruption'
    ToolCallMessage:
      name: tool_call
      title: Tool Call
      summary: Request from EVI to invoke a registered tool.
      payload:
        $ref: '#/components/schemas/ToolCallMessage'

    # ---------- Shared (sent by either side) ----------
    ToolResponseMessage:
      name: tool_response
      title: Tool Response
      summary: Successful response to a tool call.
      payload:
        $ref: '#/components/schemas/ToolResponseMessage'
    ToolErrorMessage:
      name: tool_error
      title: Tool Error
      summary: Error response to a tool call.
      payload:
        $ref: '#/components/schemas/ToolErrorMessage'
    WebSocketError:
      name: error
      title: WebSocket Error
      summary: WebSocket-level error emitted by the EVI server.
      payload:
        $ref: '#/components/schemas/WebSocketError'

    # ---------- Expression Measurement ----------
    ModelsInput:
      name: models_input
      title: Models Input
      summary: Streaming inference request - models config + media payload.
      payload:
        $ref: '#/components/schemas/ModelsInput'
    ModelsSuccess:
      name: models_success
      title: Models Success
      summary: Per-model predictions for a streamed input.
      payload:
        $ref: '#/components/schemas/ModelsSuccess'
    ModelsError:
      name: models_error
      title: Models Error
      summary: Error returned by the streaming inference server.
      payload:
        $ref: '#/components/schemas/ModelsError'
    ModelsWarning:
      name: models_warning
      title: Models Warning
      summary: Non-fatal warning returned by the streaming inference server.
      payload:
        $ref: '#/components/schemas/ModelsWarning'

  schemas:

    # ---------- EVI: client-sent ----------
    AudioInput:
      type: object
      required: [type, data]
      properties:
        type:
          type: string
          enum: [audio_input]
        data:
          type: string
          format: base64
          description: Base64-encoded audio chunk.
        custom_session_id:
          type: string
          nullable: true

    UserInput:
      type: object
      required: [type, text]
      properties:
        type:
          type: string
          enum: [user_input]
        text:
          type: string
        custom_session_id:
          type: string
          nullable: true

    AssistantInput:
      type: object
      required: [type, text]
      properties:
        type:
          type: string
          enum: [assistant_input]
        text:
          type: string
        custom_session_id:
          type: string
          nullable: true

    PauseAssistantMessage:
      type: object
      required: [type]
      properties:
        type:
          type: string
          enum: [pause_assistant_message]
        custom_session_id:
          type: string
          nullable: true

    ResumeAssistantMessage:
      type: object
      required: [type]
      properties:
        type:
          type: string
          enum: [resume_assistant_message]
        custom_session_id:
          type: string
          nullable: true

    SessionSettings:
      type: object
      required: [type]
      properties:
        type:
          type: string
          enum: [session_settings]
        audio:
          type: object
          description: Audio encoding settings (channels, encoding, sample_rate).
          properties:
            channels:
              type: integer
            encoding:
              type: string
              enum: [linear16]
            sample_rate:
              type: integer
        context:
          type: object
          description: Context text appended to user messages, either persistent or temporary.
          properties:
            text:
              type: string
            type:
              type: string
              enum: [persistent, temporary]
        system_prompt:
          type: string
          nullable: true
        language_model:
          type: object
          description: Override the language model used by EVI for this session.
          properties:
            model_provider:
              type: string
            model_resource:
              type: string
            temperature:
              type: number
        voice:
          type: object
          description: Override the voice used by EVI for this session.
        tools:
          type: array
          description: Tools available to the assistant for this session.
          items:
            type: object
        builtin_tools:
          type: array
          items:
            type: object
        variables:
          type: object
          additionalProperties:
            type: string
          description: Dynamic variables interpolated into the system prompt.
        metadata:
          type: object
          additionalProperties: true
        custom_session_id:
          type: string
          nullable: true

    # ---------- EVI: shared tool messages ----------
    ToolResponseMessage:
      type: object
      required: [type, tool_call_id, content]
      properties:
        type:
          type: string
          enum: [tool_response]
        tool_call_id:
          type: string
        content:
          type: string
          description: Result returned to the assistant from the tool.
        tool_name:
          type: string
        tool_type:
          type: string
          enum: [builtin, function]
        custom_session_id:
          type: string
          nullable: true

    ToolErrorMessage:
      type: object
      required: [type, tool_call_id, error]
      properties:
        type:
          type: string
          enum: [tool_error]
        tool_call_id:
          type: string
        error:
          type: string
          description: Error message from the tool call, not exposed to the user.
        code:
          type: string
        content:
          type: string
          description: User-facing content to surface in place of the failed tool result.
        level:
          type: string
          enum: [warn]
        tool_type:
          type: string
          enum: [builtin, function]
        custom_session_id:
          type: string
          nullable: true

    # ---------- EVI: server-sent ----------
    ChatMetadata:
      type: object
      required: [type, chat_id, chat_group_id]
      properties:
        type:
          type: string
          enum: [chat_metadata]
        chat_id:
          type: string
        chat_group_id:
          type: string
        request_id:
          type: string
        custom_session_id:
          type: string
          nullable: true

    UserMessage:
      type: object
      required: [type, message]
      properties:
        type:
          type: string
          enum: [user_message]
        message:
          type: object
          properties:
            role:
              type: string
              enum: [user]
            content:
              type: string
        models:
          type: object
          description: Expression measurement predictions for the user utterance.
          properties:
            prosody:
              type: object
        from_text:
          type: boolean
        interim:
          type: boolean
        time:
          type: object
          properties:
            begin:
              type: integer
            end:
              type: integer
        custom_session_id:
          type: string
          nullable: true

    AssistantMessage:
      type: object
      required: [type, message]
      properties:
        type:
          type: string
          enum: [assistant_message]
        id:
          type: string
        message:
          type: object
          properties:
            role:
              type: string
              enum: [assistant]
            content:
              type: string
        models:
          type: object
        from_text:
          type: boolean
        custom_session_id:
          type: string
          nullable: true

    AssistantProsody:
      type: object
      required: [type]
      properties:
        type:
          type: string
          enum: [assistant_prosody]
        id:
          type: string
        models:
          type: object
        custom_session_id:
          type: string
          nullable: true

    AudioOutput:
      type: object
      required: [type, data]
      properties:
        type:
          type: string
          enum: [audio_output]
        id:
          type: string
        data:
          type: string
          format: base64
          description: Base64-encoded synthesized assistant audio chunk.
        custom_session_id:
          type: string
          nullable: true

    AssistantEnd:
      type: object
      required: [type]
      properties:
        type:
          type: string
          enum: [assistant_end]
        custom_session_id:
          type: string
          nullable: true

    UserInterruption:
      type: object
      required: [type, time]
      properties:
        type:
          type: string
          enum: [user_interruption]
        time:
          type: integer
        custom_session_id:
          type: string
          nullable: true

    ToolCallMessage:
      type: object
      required: [type, tool_call_id, name, parameters]
      properties:
        type:
          type: string
          enum: [tool_call]
        tool_call_id:
          type: string
        name:
          type: string
        parameters:
          type: string
          description: JSON-encoded arguments for the tool call.
        tool_type:
          type: string
          enum: [builtin, function]
        response_required:
          type: boolean
        custom_session_id:
          type: string
          nullable: true

    WebSocketError:
      type: object
      required: [type, message, code]
      properties:
        type:
          type: string
          enum: [error]
        code:
          type: string
        slug:
          type: string
        message:
          type: string
        custom_session_id:
          type: string
          nullable: true

    # ---------- Expression Measurement ----------
    ModelsInput:
      type: object
      required: [models]
      properties:
        models:
          type: object
          description: Map of models to run. Each key may be `face`, `prosody`, `language`, or `burst`.
          properties:
            face:
              type: object
              description: Facial expression model configuration.
              properties:
                facs:
                  type: object
                descriptions:
                  type: object
                identify_faces:
                  type: boolean
                fps_pred:
                  type: number
                prob_threshold:
                  type: number
                min_face_size:
                  type: number
                save_faces:
                  type: boolean
            prosody:
              type: object
              description: Vocal prosody (speech) model configuration.
              properties:
                granularity:
                  type: string
                  enum: [word, sentence, utterance, conversational_turn]
                identify_speakers:
                  type: boolean
            language:
              type: object
              description: Language (text) model configuration.
              properties:
                granularity:
                  type: string
                  enum: [word, sentence, utterance, conversational_turn]
                identify_speakers:
                  type: boolean
            burst:
              type: object
              description: Vocal burst model configuration.
        data:
          type: string
          format: base64
          description: Base64-encoded media payload (image, audio or video) or, for the language model, the raw text.
        raw_text:
          type: boolean
          description: When true with the language model, treat `data` as raw UTF-8 text rather than a Base64-encoded file.
        job_details:
          type: boolean
          description: Include job-level details in the response.
        payload_id:
          type: string
          description: Client-supplied correlation id echoed back on the response.
        reset_stream:
          type: boolean
          description: Reset accumulated context (e.g. face identification, prosody context) on this stream.
        stream_window_ms:
          type: number
          description: Sliding window length, in milliseconds, used to aggregate streamed audio/video.

    ModelsSuccess:
      type: object
      properties:
        face:
          type: object
          description: Facial expression predictions.
        prosody:
          type: object
          description: Vocal prosody predictions.
        language:
          type: object
          description: Language (text) predictions.
        burst:
          type: object
          description: Vocal burst predictions.
        job_details:
          type: object
          properties:
            job_id:
              type: string
        payload_id:
          type: string
        time:
          type: object
          properties:
            begin:
              type: integer
            end:
              type: integer

    ModelsError:
      type: object
      required: [error]
      properties:
        error:
          type: string
        code:
          type: string
        payload_id:
          type: string

    ModelsWarning:
      type: object
      required: [warning]
      properties:
        warning:
          type: string
        code:
          type: string
        payload_id:
          type: string