elevenlabs · AsyncAPI Specification
ElevenLabs Conversational AI Events
Version 1.0
The ElevenLabs Conversational AI WebSocket API enables real-time, interactive voice conversations with AI agents. It supports bidirectional audio streaming, text events, and conversation lifecycle management through WebSocket connections. Clients send audio input and receive audio responses, transcriptions, and metadata events in real time.
Channels
/conversation
Receive conversation events from the agent
Bidirectional WebSocket channel for real-time conversational AI interactions. Clients send audio input and receive agent audio responses, transcriptions, and conversation events.
/monitoring
Receive monitoring events
WebSocket channel for real-time monitoring of active agent conversations. Provides text events and metadata for live observation and intervention.
Messages
ConversationInitiationMetadata
Conversation Initiation Metadata
Metadata sent when the WebSocket connection is established
AgentAudioEvent
Agent Audio
Audio chunk from the agent's speech output
AgentResponseEvent
Agent Response
Text of the agent's response
UserTranscriptEvent
User Transcript
Transcription of the user's speech input
ConversationEndEvent
Conversation End
Signals the end of the conversation
AgentInterruptionEvent
Agent Interruption
Signals that the agent was interrupted
PingEvent
Ping
Server ping for connection keep-alive
UserAudioInput
User Audio Input
Audio chunk from the user's microphone
PongResponse
Pong Response
Client pong response to server ping
MonitoringTranscriptEvent
Monitoring Transcript
Live transcript event during monitoring
MonitoringAgentResponseEvent
Monitoring Agent Response
Agent response during monitoring
Servers
wss
production
wss://api.elevenlabs.io/v1/convai/conversation
ElevenLabs Conversational AI WebSocket server for real-time voice agent interactions.