fal · AsyncAPI Specification

fal Event-Driven APIs

Version 1.0.0

AsyncAPI description of fal's event-driven inference surfaces. fal exposes two real-time channels in addition to its REST queue: (1) a Server-Sent Events stream that pushes incremental status updates for any queued model request, and (2) a bi-directional WebSocket channel used by the Realtime Inference API for ultra-low-latency interactive models such as `fast-lcm-diffusion`, `fast-turbo-diffusion`, and `fast-sdxl`. The WebSocket channel is the same surface driven by the official fal-js / fal-client SDK `realtime` helpers.

View Spec View on GitHub AIArtificial IntelligenceGenerative AIGenerative MediaImage GenerationVideo GenerationAudio GenerationInferenceServerlessGPUMCPAsyncAPIWebhooksEvents

Channels

{model_id}/requests/{request_id}/status/stream
subscribe subscribeQueueStatusStream
Subscribe to queue status events for a submitted request.
Server-Sent Events stream of queue status updates for a single submitted request. The connection remains open and emits one event per state change until the request reaches `COMPLETED`. Enable runner logs by adding `?logs=1` to the query string.
{app_id}/realtime
publish sendRealtimeInput
Send an inference input frame.
Bi-directional WebSocket channel for realtime inference. Clients send one input message per generation step and receive zero or more partial or final output frames per step. The default path is `/realtime`; some apps expose custom paths configurable through the SDK `path` option. Messages are serialized as JSON by default and MAY be serialized as MessagePack (msgpack) when using the official SDKs, which is more efficient for binary image payloads.

Messages

QueueStatusInQueue
Queue Status — IN_QUEUE
Request has been received and persisted; waiting for an available runner.
QueueStatusInProgress
Queue Status — IN_PROGRESS
fal's dispatcher has routed the request to a runner.
QueueStatusCompleted
Queue Status — COMPLETED
Result is stored and available for retrieval at `response_url` (or was POSTed to the configured webhook). This is the terminal event of the stream.
RealtimeInput
Realtime Inference Input
Inference input frame. The accepted fields are defined by the OpenAPI schema of the target model — see the model's playground page on https://fal.ai/models for the canonical schema.
RealtimeResult
Realtime Inference Result
Inference output frame. Fields are model-specific; image-generation apps return an `images` array. The `request_id` echoes the inference invocation it corresponds to.
RealtimeError
Realtime Error (x-fal-error)
Inference or framework-level error returned by the realtime runner.
RealtimeUnauthorized
Realtime Unauthorized
Sent when the supplied credentials (proxy headers or JWT) cannot be verified. The connection is closed by the server after this frame.

Servers

https
queue-sse queue.fal.run
Queue status streaming server. Emits Server-Sent Events for any submitted queue request until the request reaches the `COMPLETED` status.
wss
realtime-ws fal.run
Realtime WebSocket inference server. Authenticated either via a server-side proxy URL that injects the `Authorization: Key $FAL_KEY` header, or via a short-lived JWT token passed as the `fal_jwt_token` query parameter.

AsyncAPI Specification

Raw ↑
asyncapi: '2.6.0'
id: 'urn:com:fal:event-apis'
info:
  title: fal Event-Driven APIs
  version: '1.0.0'
  description: >
    AsyncAPI description of fal's event-driven inference surfaces. fal exposes
    two real-time channels in addition to its REST queue: (1) a Server-Sent
    Events stream that pushes incremental status updates for any queued model
    request, and (2) a bi-directional WebSocket channel used by the Realtime
    Inference API for ultra-low-latency interactive models such as
    `fast-lcm-diffusion`, `fast-turbo-diffusion`, and `fast-sdxl`. The
    WebSocket channel is the same surface driven by the official
    fal-js / fal-client SDK `realtime` helpers.
  contact:
    name: fal Support
    url: https://fal.ai/docs
  license:
    name: Proprietary
    url: https://fal.ai/legal/terms-of-service
  x-apis-json:
    humanURL: https://fal.ai/docs/model-apis/real-time
    baseURL: wss://fal.run
defaultContentType: application/json
servers:
  queue-sse:
    url: queue.fal.run
    protocol: https
    description: >
      Queue status streaming server. Emits Server-Sent Events for any submitted
      queue request until the request reaches the `COMPLETED` status.
  realtime-ws:
    url: fal.run
    protocol: wss
    description: >
      Realtime WebSocket inference server. Authenticated either via a
      server-side proxy URL that injects the `Authorization: Key $FAL_KEY`
      header, or via a short-lived JWT token passed as the
      `fal_jwt_token` query parameter.
channels:
  '{model_id}/requests/{request_id}/status/stream':
    description: >
      Server-Sent Events stream of queue status updates for a single submitted
      request. The connection remains open and emits one event per state
      change until the request reaches `COMPLETED`. Enable runner logs by
      adding `?logs=1` to the query string.
    servers:
      - queue-sse
    parameters:
      model_id:
        description: >
          Fully-qualified fal model identifier (e.g. `fal-ai/flux/dev`,
          `fal-ai/fast-sdxl`, `fal-ai/veo3`).
        schema:
          type: string
      request_id:
        description: Queue request identifier returned by the original submit call.
        schema:
          type: string
          format: uuid
    bindings:
      http:
        type: response
        method: GET
        headers:
          type: object
          properties:
            Accept:
              const: text/event-stream
    subscribe:
      operationId: subscribeQueueStatusStream
      summary: Subscribe to queue status events for a submitted request.
      message:
        oneOf:
          - $ref: '#/components/messages/QueueStatusInQueue'
          - $ref: '#/components/messages/QueueStatusInProgress'
          - $ref: '#/components/messages/QueueStatusCompleted'

  '{app_id}/realtime':
    description: >
      Bi-directional WebSocket channel for realtime inference. Clients send
      one input message per generation step and receive zero or more partial
      or final output frames per step. The default path is `/realtime`; some
      apps expose custom paths configurable through the SDK `path` option.
      Messages are serialized as JSON by default and MAY be serialized as
      MessagePack (msgpack) when using the official SDKs, which is more
      efficient for binary image payloads.
    servers:
      - realtime-ws
    parameters:
      app_id:
        description: >
          Realtime-capable fal app id, e.g. `fal-ai/fast-lcm-diffusion`,
          `fal-ai/fast-turbo-diffusion`, or `fal-ai/fast-sdxl`.
        schema:
          type: string
    bindings:
      ws:
        bindingVersion: '0.1.0'
        query:
          type: object
          properties:
            fal_jwt_token:
              type: string
              description: >
                Short-lived JWT minted by your backend via the fal token
                endpoint. Required when not connecting through a proxy URL.
            max_buffering:
              type: integer
              description: >
                Optional maximum number of input frames the server may buffer
                before back-pressuring.
    publish:
      operationId: sendRealtimeInput
      summary: Send an inference input frame.
      description: >
        Each message represents a single inference invocation against the
        connected app. Field names follow the OpenAPI schema of the chosen
        model (e.g. `prompt`, `image_url`, `seed`, `num_inference_steps`,
        `strength`, `sync_mode`).
      message:
        $ref: '#/components/messages/RealtimeInput'
    subscribe:
      operationId: receiveRealtimeOutput
      summary: Receive inference outputs and connection control messages.
      message:
        oneOf:
          - $ref: '#/components/messages/RealtimeResult'
          - $ref: '#/components/messages/RealtimeError'
          - $ref: '#/components/messages/RealtimeUnauthorized'
components:
  messages:
    QueueStatusInQueue:
      name: QueueStatusInQueue
      title: Queue Status — IN_QUEUE
      summary: >
        Request has been received and persisted; waiting for an available
        runner.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/QueueStatusInQueue'

    QueueStatusInProgress:
      name: QueueStatusInProgress
      title: Queue Status — IN_PROGRESS
      summary: fal's dispatcher has routed the request to a runner.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/QueueStatusInProgress'

    QueueStatusCompleted:
      name: QueueStatusCompleted
      title: Queue Status — COMPLETED
      summary: >
        Result is stored and available for retrieval at `response_url` (or
        was POSTed to the configured webhook). This is the terminal event of
        the stream.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/QueueStatusCompleted'

    RealtimeInput:
      name: RealtimeInput
      title: Realtime Inference Input
      summary: >
        Inference input frame. The accepted fields are defined by the
        OpenAPI schema of the target model — see the model's playground page
        on https://fal.ai/models for the canonical schema.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/RealtimeInput'

    RealtimeResult:
      name: RealtimeResult
      title: Realtime Inference Result
      summary: >
        Inference output frame. Fields are model-specific; image-generation
        apps return an `images` array. The `request_id` echoes the inference
        invocation it corresponds to.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/RealtimeResult'

    RealtimeError:
      name: RealtimeError
      title: Realtime Error (x-fal-error)
      summary: Inference or framework-level error returned by the realtime runner.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/RealtimeError'

    RealtimeUnauthorized:
      name: RealtimeUnauthorized
      title: Realtime Unauthorized
      summary: >
        Sent when the supplied credentials (proxy headers or JWT) cannot be
        verified. The connection is closed by the server after this frame.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/RealtimeUnauthorized'

  schemas:
    QueueStatusInQueue:
      type: object
      required: [status, request_id]
      properties:
        status:
          type: string
          const: IN_QUEUE
        request_id:
          type: string
          format: uuid
          description: Unique request identifier.
        queue_position:
          type: integer
          minimum: 0
          description: Number of requests ahead of this one in the queue.
        response_url:
          type: string
          format: uri
          description: URL where the final result will be retrievable.

    QueueStatusInProgress:
      type: object
      required: [status, request_id]
      properties:
        status:
          type: string
          const: IN_PROGRESS
        request_id:
          type: string
          format: uuid
        response_url:
          type: string
          format: uri
        logs:
          type: array
          description: >
            Runner log lines. Only populated when the stream URL is opened
            with the `?logs=1` query parameter.
          items:
            $ref: '#/components/schemas/QueueLogEntry'

    QueueStatusCompleted:
      type: object
      required: [status, request_id]
      properties:
        status:
          type: string
          const: COMPLETED
        request_id:
          type: string
          format: uuid
        response_url:
          type: string
          format: uri
        logs:
          type: array
          items:
            $ref: '#/components/schemas/QueueLogEntry'
        metrics:
          type: object
          properties:
            inference_time:
              type: number
              description: Processing duration in seconds.

    QueueLogEntry:
      type: object
      properties:
        message:
          type: string
        level:
          type: string
          description: Log level emitted by the runner (e.g. `INFO`, `ERROR`).
        timestamp:
          type: string
          format: date-time

    RealtimeInput:
      type: object
      description: >
        Free-form, model-specific input frame. The exemplar properties below
        are the most common across realtime image-generation apps; consult
        the model OpenAPI schema for the complete and authoritative list.
      additionalProperties: true
      properties:
        prompt:
          type: string
        negative_prompt:
          type: string
        image_url:
          type: string
          description: >
            Reference image URL, or `data:` URI carrying a base64-encoded
            image when `sync_mode` is `true`.
        seed:
          type: integer
        num_inference_steps:
          type: integer
        strength:
          type: number
        sync_mode:
          type: boolean
          description: >
            When `true`, responses are returned as base64-encoded payloads
            on the same WebSocket frame rather than via CDN URLs.

    RealtimeResult:
      type: object
      description: >
        Model-specific output frame. Fields shown reflect the
        image-generation realtime apps (fast-lcm-diffusion,
        fast-turbo-diffusion, fast-sdxl). Other modalities return their own
        modality-specific fields.
      additionalProperties: true
      properties:
        request_id:
          type: string
          description: >
            Echoes the inference invocation this output corresponds to.
        images:
          type: array
          items:
            type: object
            properties:
              url:
                type: string
                description: >
                  CDN URL of the generated image, or a `data:` URI when
                  `sync_mode: true` was supplied.
              width:
                type: integer
              height:
                type: integer
              content_type:
                type: string
        timings:
          type: object
          additionalProperties:
            type: number
        seed:
          type: integer
        has_nsfw_concepts:
          type: array
          items:
            type: boolean

    RealtimeError:
      type: object
      required: [type]
      properties:
        type:
          type: string
          const: x-fal-error
        error:
          type: string
          description: Short error code or label.
        reason:
          type: string
          description: Human-readable reason describing the failure.

    RealtimeUnauthorized:
      type: object
      required: [status, error]
      properties:
        status:
          type: string
          const: error
        error:
          type: string
          const: Unauthorized

  securitySchemes:
    falKey:
      type: httpApiKey
      in: header
      name: Authorization
      description: >
        `Authorization: Key $FAL_KEY` header. Used on the SSE queue stream
        and on the realtime WebSocket when the connection is routed through
        a server-side proxy.
    falJwt:
      type: userPassword
      description: >
        Short-lived JWT supplied as the `fal_jwt_token` query parameter on
        the realtime WebSocket URL. Mint via your backend's fal token
        endpoint.