OpenAI · AsyncAPI Specification
OpenAI Realtime API
Version 2024-10-01
The OpenAI Realtime API provides low-latency, bidirectional, event-driven communication with multimodal models that natively support speech-to-speech, text, and audio in a single conversation. This AsyncAPI document describes the **WebSocket** transport for the Realtime API, including all documented client-to-server events and server-to-client events. The Realtime API is currently in beta. Clients must include the `OpenAI-Beta: realtime=v1` header when connecting. Connection URL: wss://api.openai.com/v1/realtime?model={model} Events flow over a single full-duplex WebSocket connection. Every event has a top-level `type` and most events also carry an `event_id` correlation id.
Channels
session.update
Update session configuration.
Send by the client to update the session's default configuration (modalities, instructions, voice, audio formats, turn detection, tools, tool_choice, temperature, max_response_output_tokens).
input_audio_buffer.append
Append audio bytes to the input buffer.
Send by the client to append base64-encoded audio bytes to the input audio buffer. The default audio format is `pcm16` at 24 kHz.
input_audio_buffer.commit
Commit the input audio buffer.
Send by the client to commit the input audio buffer to the conversation as a user message. Required in non-VAD modes before requesting a response.
input_audio_buffer.clear
Clear the input audio buffer.
Send by the client to clear the input audio buffer without committing it.
conversation.item.create
Insert a conversation item.
Send by the client to insert a conversation item (a message, function_call, or function_call_output) into the conversation history.
conversation.item.truncate
Truncate an in-progress assistant item.
Send by the client to truncate the assistant audio of an in-progress response item. Used for interruption: audio after `audio_end_ms` is discarded and any text after that point is cleared.
conversation.item.delete
Delete a conversation item.
Send by the client to delete a conversation item by id.
response.create
Trigger a model response.
Send by the client to instruct the model to generate a response. Optionally overrides the session configuration for this single response.
response.cancel
Cancel an in-progress response.
Send by the client to cancel an in-progress response.
error
Receive an error event.
Server-emitted error envelope. Sent whenever a client event is invalid or the server encounters a problem processing a request.
session.created
Receive session.created.
Emitted by the server immediately after the WebSocket connection is authenticated. Contains the initial session configuration.
session.updated
Receive session.updated.
Emitted after the server applies a `session.update` from the client.
conversation.created
Receive conversation.created.
Emitted by the server when a new conversation is created on the session.
conversation.item.created
Receive conversation.item.created.
Emitted when a new conversation item has been added (either by the client or by the model generating a response).
conversation.item.input_audio_transcription.completed
Receive input_audio_transcription.completed.
Emitted when input audio transcription for a user audio item has completed (requires `input_audio_transcription` enabled on the session).
conversation.item.input_audio_transcription.failed
Receive input_audio_transcription.failed.
Emitted when input audio transcription fails for a user audio item.
conversation.item.truncated
Receive conversation.item.truncated.
Emitted after the server applies a `conversation.item.truncate` request from the client.
conversation.item.deleted
Receive conversation.item.deleted.
Emitted after the server applies a `conversation.item.delete` request.
input_audio_buffer.committed
Receive input_audio_buffer.committed.
Emitted when the input audio buffer is committed (either explicitly by the client via `input_audio_buffer.commit`, or implicitly by server VAD).
input_audio_buffer.cleared
Receive input_audio_buffer.cleared.
Emitted after the server clears the input audio buffer.
input_audio_buffer.speech_started
Receive input_audio_buffer.speech_started.
Emitted in server VAD mode when speech is detected starting in the input audio buffer.
input_audio_buffer.speech_stopped
Receive input_audio_buffer.speech_stopped.
Emitted in server VAD mode when speech is detected stopping in the input audio buffer.
response.created
Receive response.created.
Emitted when the server begins generating a response after a `response.create` (explicit) or after server VAD commits a user turn.
response.done
Receive response.done.
Emitted when a response has finished (status `completed`, `cancelled`, `failed`, or `incomplete`). Carries usage and final output items.
response.output_item.added
Receive response.output_item.added.
Emitted when a new output item is added to a response.
response.output_item.done
Receive response.output_item.done.
Emitted when an output item on a response is complete.
response.content_part.added
Receive response.content_part.added.
Emitted when a new content part (text, audio, or transcript) is added to an output item.
response.content_part.done
Receive response.content_part.done.
Emitted when a content part on an output item is complete.
response.text.delta
Receive response.text.delta.
Streaming text delta for a `text` content part on an assistant item.
response.text.done
Receive response.text.done.
Emitted when a `text` content part is fully generated.
response.audio_transcript.delta
Receive response.audio_transcript.delta.
Streaming transcript delta for an `audio` content part on an assistant item.
response.audio_transcript.done
Receive response.audio_transcript.done.
Emitted when the transcript for an `audio` content part is fully generated.
response.audio.delta
Receive response.audio.delta.
Streaming base64-encoded audio delta for an `audio` content part on an assistant item.
response.audio.done
Receive response.audio.done.
Emitted when an `audio` content part is fully generated. No final base64 payload is included; clients reassemble from the deltas.
response.function_call_arguments.delta
Receive response.function_call_arguments.delta.
Streaming delta for a tool/function call's `arguments` string.
response.function_call_arguments.done
Receive response.function_call_arguments.done.
Emitted when the `arguments` string for a function call is complete.
rate_limits.updated
Receive rate_limits.updated.
Emitted periodically with the current rate limit state for the connection (requests and tokens, remaining and reset_seconds).
Messages
SessionUpdate
session.update
Update session configuration.
InputAudioBufferAppend
input_audio_buffer.append
Append audio bytes to the input buffer.
InputAudioBufferCommit
input_audio_buffer.commit
Commit the input audio buffer.
InputAudioBufferClear
input_audio_buffer.clear
Clear the input audio buffer.
ConversationItemCreate
conversation.item.create
Insert a conversation item.
ConversationItemTruncate
conversation.item.truncate
Truncate an assistant item's audio.
ConversationItemDelete
conversation.item.delete
Delete a conversation item.
ResponseCreate
response.create
Trigger a model response.
ResponseCancel
response.cancel
Cancel an in-progress response.
Error
error
Server error.
SessionCreated
session.created
Session has been created.
SessionUpdated
session.updated
Session configuration updated.
ConversationCreated
conversation.created
Conversation created.
ConversationItemCreated
conversation.item.created
Conversation item created.
InputAudioTranscriptionCompleted
conversation.item.input_audio_transcription.completed
Input audio transcription completed.
InputAudioTranscriptionFailed
conversation.item.input_audio_transcription.failed
Input audio transcription failed.
ConversationItemTruncated
conversation.item.truncated
Conversation item truncated.
ConversationItemDeleted
conversation.item.deleted
Conversation item deleted.
InputAudioBufferCommitted
input_audio_buffer.committed
Input audio buffer committed.
InputAudioBufferCleared
input_audio_buffer.cleared
Input audio buffer cleared.
InputAudioBufferSpeechStarted
input_audio_buffer.speech_started
VAD speech started.
InputAudioBufferSpeechStopped
input_audio_buffer.speech_stopped
VAD speech stopped.
ResponseCreated
response.created
Response generation started.
ResponseDone
response.done
Response generation finished.
ResponseOutputItemAdded
response.output_item.added
New output item added to response.
ResponseOutputItemDone
response.output_item.done
Output item on response complete.
ResponseContentPartAdded
response.content_part.added
Content part added to output item.
ResponseContentPartDone
response.content_part.done
Content part on output item complete.
ResponseTextDelta
response.text.delta
Text delta for assistant message.
ResponseTextDone
response.text.done
Text content part complete.
ResponseAudioTranscriptDelta
response.audio_transcript.delta
Transcript delta for audio content part.
ResponseAudioTranscriptDone
response.audio_transcript.done
Transcript for audio content part complete.
ResponseAudioDelta
response.audio.delta
Base64 audio delta for audio content part.
ResponseAudioDone
response.audio.done
Audio content part complete.
ResponseFunctionCallArgumentsDelta
response.function_call_arguments.delta
Function-call arguments delta.
ResponseFunctionCallArgumentsDone
response.function_call_arguments.done
Function-call arguments complete.
RateLimitsUpdated
rate_limits.updated
Current rate limit state.
Servers
wss
production
api.openai.com/v1/realtime
OpenAI Realtime WebSocket endpoint. The `model` query parameter selects the underlying realtime-capable model (for example `gpt-4o-realtime-preview-2024-10-01`).