fal · AsyncAPI Specification
fal Event-Driven APIs
Version 1.0.0
AsyncAPI description of fal's event-driven inference surfaces. fal exposes two real-time channels in addition to its REST queue: (1) a Server-Sent Events stream that pushes incremental status updates for any queued model request, and (2) a bi-directional WebSocket channel used by the Realtime Inference API for ultra-low-latency interactive models such as `fast-lcm-diffusion`, `fast-turbo-diffusion`, and `fast-sdxl`. The WebSocket channel is the same surface driven by the official fal-js / fal-client SDK `realtime` helpers.
View Spec
View on GitHub
AIArtificial IntelligenceGenerative AIGenerative MediaImage GenerationVideo GenerationAudio GenerationInferenceServerlessGPUMCPAsyncAPIWebhooksEvents
Channels
{model_id}/requests/{request_id}/status/stream
Subscribe to queue status events for a submitted request.
Server-Sent Events stream of queue status updates for a single submitted request. The connection remains open and emits one event per state change until the request reaches `COMPLETED`. Enable runner logs by adding `?logs=1` to the query string.
{app_id}/realtime
Send an inference input frame.
Bi-directional WebSocket channel for realtime inference. Clients send one input message per generation step and receive zero or more partial or final output frames per step. The default path is `/realtime`; some apps expose custom paths configurable through the SDK `path` option. Messages are serialized as JSON by default and MAY be serialized as MessagePack (msgpack) when using the official SDKs, which is more efficient for binary image payloads.
Messages
QueueStatusInQueue
Queue Status — IN_QUEUE
Request has been received and persisted; waiting for an available runner.
QueueStatusInProgress
Queue Status — IN_PROGRESS
fal's dispatcher has routed the request to a runner.
QueueStatusCompleted
Queue Status — COMPLETED
Result is stored and available for retrieval at `response_url` (or was POSTed to the configured webhook). This is the terminal event of the stream.
RealtimeInput
Realtime Inference Input
Inference input frame. The accepted fields are defined by the OpenAPI schema of the target model — see the model's playground page on https://fal.ai/models for the canonical schema.
RealtimeResult
Realtime Inference Result
Inference output frame. Fields are model-specific; image-generation apps return an `images` array. The `request_id` echoes the inference invocation it corresponds to.
RealtimeError
Realtime Error (x-fal-error)
Inference or framework-level error returned by the realtime runner.
RealtimeUnauthorized
Realtime Unauthorized
Sent when the supplied credentials (proxy headers or JWT) cannot be verified. The connection is closed by the server after this frame.
Servers
https
queue-sse
queue.fal.run
Queue status streaming server. Emits Server-Sent Events for any submitted queue request until the request reaches the `COMPLETED` status.
wss
realtime-ws
fal.run
Realtime WebSocket inference server. Authenticated either via a server-side proxy URL that injects the `Authorization: Key $FAL_KEY` header, or via a short-lived JWT token passed as the `fal_jwt_token` query parameter.