AsyncAPI description for the three WebSocket audio-streaming channels exposed by the Suki Speech Service (Suki for Partners). Each REST session-create call (Ambient, Dictation, Form Filling) returns an `audioWebsocketUrl` over which clinical audio is streamed as Base64-encoded PCM inside JSON frames. The same connection delivers control events (start/end of stream). Asynchronous results (clinical note, transcript, structured form data) are delivered out-of-band via REST polling or partner-hosted webhooks documented in the companion OpenAPI specifications. Sources: in-repo OpenAPI specs (openapi/suki-ambient-api-openapi.yml, openapi/suki-dictation-api-openapi.yml, openapi/suki-form-filling-api-openapi.yml) and live developer docs at https://developer.suki.ai/llms.txt and https://developer.suki.ai/documentation/audio-stream.
Stream encounter audio and control frames to Suki.
Ambient session audio channel. Streams microphone audio from the provider-patient encounter into Suki for ambient clinical note generation. Returned by `POST /ambient/sessions` as `audioWebsocketUrl`. Clients first publish a START_TIME control frame, then publish AUDIO frames with Base64-encoded PCM in the `data` field, and finally publish an AUDIO frame whose `data` is `"RU9G"` (Base64 "EOF") to terminate the stream.
/api/v1/dictation/sessions/{sessionId}/audio
publishpublishDictationAudio
Stream PCM_S16LE audio and the AUDIO_END terminator to Suki.
Dictation session audio channel. Streams clinician speech to Suki and receives partial and final transcriptions in real time. Returned by `POST /dictation/sessions` as `audioWebsocketUrl`. The socket opens only while the session is in READY or IDLE state.
/api/v1/form-filling/sessions/{sessionId}/audio
publishpublishFormFillingAudio
Stream form-filling voice input to Suki.
Form-filling session audio channel. Streams voice input that Suki maps into the structured fields of the form template attached to the session. Returned by `POST /form-filling/sessions` as `audioWebsocketUrl`. Uses the same Base64 PCM JSON framing as the ambient channel.
Messages
✉
AmbientStartTime
Start-of-stream control frame
First frame sent on an ambient or form-filling session.
✉
AmbientAudioFrame
Ambient audio frame
Base64-encoded PCM audio chunk. To terminate the stream, send a frame whose `data` is `RU9G` (Base64 for the ASCII bytes "EOF").
✉
DictationAudioFrame
Dictation audio frame
Base64-encoded PCM_S16LE audio chunk.
✉
DictationAudioEnd
Dictation end-of-stream control frame
Terminates a dictation audio stream.
✉
TranscriptionStreamResponse
Dictation transcript event
Partial or final transcription delivered while audio is streaming.
✉
StreamStatusEvent
Stream status event
Server-emitted status updates (e.g. acknowledgements, errors, stream lifecycle). Final clinical content for ambient and form-filling sessions is fetched via REST or delivered via partner-hosted webhooks documented in the companion OpenAPI specs.